Setting up Hadoop-2.8.0 in Standalone Mode

In this section, we will learn how to setup Hadoop in Standalone mode. In Standalone mode, all Hadoop operations will run in a single JVM. In Hadoop Standalone mode, your local file system is used as the storage and a single JVM that will perform all MR-related operations.  Let us see how to setup CLI MiniCluster:

Step 1: Ensure package lists are updated.

sudo apt-get update

Step 2: Install Java 7. We are going to use OpenJDK7 however you can feel free to use Oracle JDK 7.

sudo apt-get install openjdk-7-jdk

java -version

Step 3: Install SSH

sudo apt-get install openssh-server

Step 4: Extract the Hadoop binary tar file that you have built and copied in the home folder. Incase you missed how to create your own binaries, you can refer my post on Building Apache Hadoop 2.8.0 from Scratch

tar -xvzf hadoop-2.8.0.tar.gz

Step 5: Rename the extracted folder. This is done for our comfort.

mv hadoop-2.8.0 hadoop2

Step 6: Setup Environment Variables to identify Hadoop executables, configurations, and dependencies. You will need to edit the .bashrc file that is available in the home folder.

vi .bashrc

#Add the below lines at the start of the file.

export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64
export HADOOP_INSTALL=/home/hadoop/hadoop2
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
export HADOOP_CONF_DIR=$HADOOP_INSTALL/etc/hadoop
export YARN_CONF_DIR=$HADOOP_INSTALL/etc/hadoop
export PATH=$PATH:$HADOOP_CONF_DIR/bin
export PATH=$PATH:$YARN_CONF_DIR/sbin

Step 7: Refresh and apply the environment variables

exec bash

Step 8: Inform Hadoop where Java is

vi /home/hadoop/hadoop2/libexec/hadoop-config.sh

#Add the following line at the start of file
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64

Step 9: Setup Hadoop Environment Variables. This is mostly used by HDFS shell scripts that are present in the sbin location of the Hadoop framework.

vi /home/hadoop/hadoop2/etc/hadoop/hadoop-env.sh

 #Add the following line at the start of file
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64/
export HADOOP_INSTALL=/home/hadoop/hadoop2
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"

Step 10: Setup YARN variables. This is mostly used by YARN shell scripts present in the sbin location of the Hadoop framework.

vi /home/hadoop/hadoop2/etc/hadoop/yarn-env.sh

#Add the following line at the start of file
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64/
export HADOOP_HOME=/home/hadoop/hadoop2
export HADOOP_MAPRED_HOME=$HADOOP_HOME   
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$PATH:$HADOOP_HOME/bin

After setting up all environment variables, Hadoop has now been installed in Standalone mode. Let us now test our installation. To do so, create two Putty sessions. One session is for monitoring JVMs using jps command, and the other session can be used to execute a MapReduce application in Standalone mode.

 

Step 11: We will use the example jar file provided by the framework. This is available in the location /home/hadoop/hadoop2/share/hadoop/mapreduce folder. Now create two Putty Sessions. In the first session, type the following command as shown below:

watch -n 1 jps

The resultant will be that the jps command will run every 01 (one) second which will enable monitoring of any newly created JVMs. In the second session, type the following command to execute a WordCount Program

hadoop jar /home/hadoop/hadoop2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.0.jar wordcount /home/hadoop/hadoop2/README.txt /home/hadoop/OutputWC

 

The syntax is as follows:

 

hadoop jar <jar_file> <prog_name> <input> <output>

 

where,

jar_file – The location of MR JAR which is to be executed in Hadoop Cluster

prog_name – The name of class which holds the main method

input – The file input location that is to be processed

output – The folder output location where all the output will be stored

 

While the program is being executed, observe the first putty session. You will see a JVM that has been invoked. This takes on the entire accountability to execute your MapReduce program. Officially, this is called CLI MiniCluster as per Hadoop documentation, in which all the required operations are done in a single JVM.

 

Now you have learned how to setup a CLI-Minicluster. One of the most frequently asked questions while performing this setup is, “Why do we need to learn this? Nobody uses this kind of setup in production anymore.” You might also question the benefit that you gain out of this setup.  The answer to both questions is that you need to learn this because, to setup a Multinode cluster, you first need to setup Hadoop in the Standalone mode in each node participating in the cluster.

Prashant Nair

Bigdata Consultant | Author | Corporate Trainer | Technical Reviewer Passionate about new trends and technologies. More Geeky. Contact me for training and consulting !!!

One thought on “Setting up Hadoop-2.8.0 in Standalone Mode

  1. Hi Prashant,

    I am unable to start the services after formatting the hdfsdrive directory, as shown below:

    Feeroz@WEBHEAD ~
    $ ls -rlt
    total 476978
    -rw-r–r– 1 Feeroz None 30 Nov 20 21:25 firstfile.log
    -rw-r–r– 1 Feeroz None 15 Nov 20 21:44 testfile
    -rwxr-xr-x 1 Feeroz None 63851630 Nov 20 22:59 hadoop-1.2.1.tar.gz
    drwxr-xr-x+ 1 Feeroz None 0 Nov 20 23:20 hdfsdrive
    drwxr-xr-x+ 1 Feeroz None 0 Nov 25 13:55 data
    drwxr-xr-x+ 1 Feeroz None 0 Nov 25 14:28 hadoop
    -rwxr-xr-x 1 Feeroz None 424555111 Nov 25 15:08 hadoop-2.8.1.tar.gz
    drwxr-xr-x+ 1 Feeroz None 0 Nov 26 20:05 hadoop2

    Feeroz@WEBHEAD ~
    $ hadoop-daemon.sh start namenode
    starting namenode, logging to /home/Feeroz/hadoop2/logs/hadoop-Feeroz-namenode-WEBHEAD.out

    Feeroz@WEBHEAD ~
    $ jps
    1268 Jps

    Feeroz@WEBHEAD ~
    $ cat /home/Feeroz/hadoop2/logs/hadoop-Feeroz-namenode-WEBHEAD.out
    ulimit -a for user Feeroz
    core file size (blocks, -c) unlimited
    data seg size (kbytes, -d) unlimited
    file size (blocks, -f) unlimited
    open files (-n) 256
    pipe size (512 bytes, -p) 8
    stack size (kbytes, -s) 2036
    cpu time (seconds, -t) unlimited
    max user processes (-u) 256
    virtual memory (kbytes, -v) unlimited

Leave a Reply

Your email address will not be published. Required fields are marked *