#--------------------------------Version bd-16s ------------------------- #--------------------------------------------------------- Course: Big Data Analytics Instructor: Prof. Dr.Dr. Lars Schmidt-Thieme, Mohsan Jameel Information Systems and Machine Learning Lab University of Hildesheim contact: mohsan.jameel@ismll.de These guidelines are for installing hadoop as standalone on your laptop or virtual machine Help: https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html #-------------TOC------------- A. INSTALLATION STEPS FOR HADOOP B. EXECUTION STEPS C. COMPILATION STEPS #--------------------------------------------------------- A. INSTALLATION STEP FOR HADOOP #--------------------------------------------------------- 1) Download Hadoop from the link below http://ftp.fau.de/apache/hadoop/common/hadoop-2.7.2/hadoop-2.7.2.tar.gz 2) prerequisite for running Hadoop are -> Make sure Java is installed on your machine i.e. java and javac are working on commandline - if not installed $sudo apt-get install java -> JAVA_HOME variable is set - to check if it is set run $echo $JAVA_HOME - if nothing printed out than you can set it in .bashrc file by using text editor (path to java installation may differ) $gedit ~/.bashrc and add. export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/ $source ~/.bashrc -> check is ssh is installed and passwordless authentication is done. $ssh localhost - if it ask for password than you need to follow this http://www.linuxproblem.org/art_9.html OR $ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys $ chmod 0600 ~/.ssh/authorized_keys 3) extract hadoop-2.7.2.tar.gz in a director probably (/home//hadoop $tar -zxf hadoop-2.7.2.tar.gz 4) change director to hadoop-2.7.2 5) setting hadoop configuration files a) add JAVA_HOME in the etc/hadoop/hadoop-env.sh $gedit etc/hadoop/hadoop-env.sh -add/update>> # set to the root of your Java installation export JAVA_HOME=/usr/java/latest -check if hadoop is setup correctly $ bin/hadoop b) add following to etc/hadoop/core-site.xml $gedit etc/hadoop/core-site.xml fs.defaultFS hdfs://localhost:9000 c) add following to etc/hadoop/hdfs-site.xml $gedit etc/hadoop/hdfs-site.xml dfs.replication 1 d) check if daemons are running $jps output 9547 DataNode 9388 NameNode 9745 SecondaryNameNode 16160 Jps #--------------------------------------------------------- B. EXECUTION STEP FOR HADOOP #--------------------------------------------------------- 1)Setting for executing a job a) First you need to format the filesystem... (only once is enough) $./bin/hadoop namenode -format b) NameNode daemon and DataNode daemon $./sbin/start-dfs.sh NameNode information available at http://localhost:50070/ c) Make the HDFS directories required to execute MapReduce jobs: $./bin/hdfs dfs -mkdir /user $./bin/hdfs dfs -mkdir /user/mohsan 2) Putting datafiles to hdfs. whenever you want to run a job first you need to put your data on hdfs for that you can do. (for example) $./bin/hdfs dfs -put example: (put folder /etc/hadoop into input folder of hdfs $./bin/hdfs dfs -put etc/hadoop input 3) Executing a job 1) to execute a job you need to provide jar file with classname,input and output folders as well as additional paramter if required by program $./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar grep input output 'dfs[a-z.]+' 2) see the output $./bin/hdfs dfs -cat output/* 4) WordCount example (tutorial https://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html) $./bin/hdfs dfs -mkdir wordcountinput $./bin/hdfs dfs -put ../example/Hadoop-WordCount/input/ wordcountinput $./bin/hdfs dfs -ls wordcountdata $./bin/hadoop jar ../example/Hadoop-WordCount/wordcount.jar WordCount wordcountinput wordcountoutput $./bin/hdfs dfs -cat wordcountoutput/* #--------------------------------------------------------- C. COMPILATION STEPS #--------------------------------------------------------- 1) Compiling Hadoop code on commandline (please note that version number may differe in the name of jar file which you have to change according to your version number) $javac -classpath $HADOOP_HOME/share/hadoop/common/hadoop-common-2.7.2.jar:$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.7.2.jar:$HADOOP_HOME/share/hadoop/common/lib/commons-cli-1.2.jar .java 2) Convert to jar file (https://docs.oracle.com/javase/tutorial/security/toolsign/step2.html) $jar -cvf .jar -C