Building Apache Hadoop 2.8.0 from Scratch

Why this tutorial?

How many of you have tried installing Hadoop using binaries offered by Apache Software Foundation website successfully? How many of you got the following warning (WARN) message whenever you performed any HDFS CLI commands?

Building Apache Hadoop 2.8.0 from Scratch

I am pretty most of you raised your hands.  This is a very common question in my Hadoop Administration class, especially once I complete my single node cluster installation and start demonstrating the CLI Commands.

This happens because Apache Hadoop website officially provides 32-bit binary distribution which is not considered feasible for production use. The maximum amount of RAM usable irrespective of hardware provided is 4 GB; thus making it unfeasible.

Through this tutorial, I will show you how to build Apache Hadoop from Scratch in a 64-bit system to make the distribution 64-bit compatible.

What I used?

  1. Ubuntu 14.04 Desktop LTS 64 bit OS
  2. Apache Hadoop 2.8.0 Source Tar file

Steps to build Apache Hadoop:

Step 1: Login to root

sudo su

Step 2: Install all dependencies required for building Apache Hadoop

add-apt-repository ppa:george-edison55/cmake-3.x
apt-get update
apt-get install build-essential
apt-get install software-properties-common
apt-get install cmake
apt-get install subversion git
apt-get install zlib1g-dev
apt-get install libssl-dev
apt-get install ant

 

Step 3: Download and extract Apache Maven  (I used apache-maven-3.5.0.tar.gz). In case the below direct link doesn’t work, get the link from http://maven.apache.org/download.cgi

wget http://redrockdigimark.com/apachemirror/maven/maven-3/3.5.0/binaries/apache-maven-3.5.0-bin.tar.gz
tar -xvzf apache-maven-3.5.0-bin.tar.gz

 

Step 4: Setup Maven environment and update the same

vi /etc/profile.d/maven.sh

#Add the following lines in the file

export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64/

export M3_HOME=/home/hadoop/apache-maven-3.5.0

export PATH=$PATH:$JAVA_HOME/bin:$M3_HOME/bin
source  /etc/profile.d/maven.sh

Step 5: Install OpenJDK7 (You can also use Oracle JDK)

apt-get install openjdk-7-jdk

Step 6:  Download and Build Protocol Buffers from Google Developers website

wget https://github.com/google/protobuf/releases/download/v2.5.0/protobuf-2.5.0.tar.gz
tar -xzvf protobuf-2.5.0.tar.gz
cd protobuf-2.5.0
./configure
make
make install

 

Step 8: Download Hadoop Source tar ( I am using hadoop-2.8.0 ).  In case the below direct link doesn’t work, get the link from http://hadoop.apache.org/releases.html

export LD_LIBRARY_PATH=/usr/local/lib
wget http://redrockdigimark.com/apachemirror/hadoop/common/hadoop-2.8.0/hadoop-2.8.0-src.tar.gz
tar -xvzf hadoop-2.8.0-src.tar.gz
cd hadoop-2.8.0-src
mvn package -Pdist,native -DskipTests -Dtar

 

hadoopbuildoutput

Hadoop distribution tar available at: /home/hadoop/hadoop-2.8.0-src/hadoop-dist/target/hadoop-2.8.0.tar.gz

You can now pat your back and proudly say that you know how to build your own distro for Hadoop.

How will this benefit you?

Following are the things you get when you build Hadoop in a 64-bit machine:

  1. Warning message not available anymore since Hadoop now has correct 64-bit libraries
  2. Can use more than 4GB RAM per node

Hope you liked this tutorial. Please add your comments below. Thanks!

 

Prashant Nair

Bigdata Consultant | Author | Corporate Trainer | Technical Reviewer Passionate about new trends and technologies. More Geeky. Contact me for training and consulting !!!

Leave a Reply

Your email address will not be published. Required fields are marked *