Why this tutorial?
How many of you have tried installing Hadoop using binaries offered by Apache Software Foundation website successfully? How many of you got the following warning (WARN) message whenever you performed any HDFS CLI commands?
I am pretty most of you raised your hands. This is a very common question in my Hadoop Administration class, especially once I complete my single node cluster installation and start demonstrating the CLI Commands.
This happens because Apache Hadoop website officially provides 32-bit binary distribution which is not considered feasible for production use. The maximum amount of RAM usable irrespective of hardware provided is 4 GB; thus making it unfeasible.
Through this tutorial, I will show you how to build Apache Hadoop from Scratch in a 64-bit system to make the distribution 64-bit compatible.
What I used?
- Ubuntu 14.04 Desktop LTS 64 bit OS
- Apache Hadoop 2.8.0 Source Tar file
Steps to build Apache Hadoop:
Step 1: Login to root
Step 2: Install all dependencies required for building Apache Hadoop
apt-get install build-essential
apt-get install software-properties-common
apt-get install cmake
apt-get install subversion git
apt-get install zlib1g-dev
apt-get install libssl-dev
apt-get install ant
Step 3: Download and extract Apache Maven (I used apache-maven-3.5.0.tar.gz). In case the below direct link doesn’t work, get the link from http://maven.apache.org/download.cgi
tar -xvzf apache-maven-3.5.0-bin.tar.gz
Step 4: Setup Maven environment and update the same
vi /etc/profile.d/maven.sh #Add the following lines in the file export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64/ export M3_HOME=/home/hadoop/apache-maven-3.5.0 export PATH=$PATH:$JAVA_HOME/bin:$M3_HOME/bin
Step 5: Install OpenJDK7 (You can also use Oracle JDK)
apt-get install openjdk-7-jdk
Step 6: Download and Build Protocol Buffers from Google Developers website
tar -xzvf protobuf-2.5.0.tar.gz
Step 8: Download Hadoop Source tar ( I am using hadoop-2.8.0 ). In case the below direct link doesn’t work, get the link from http://hadoop.apache.org/releases.html
tar -xvzf hadoop-2.8.0-src.tar.gz
mvn package -Pdist,native -DskipTests -Dtar
Hadoop distribution tar available at: /home/hadoop/hadoop-2.8.0-src/hadoop-dist/target/hadoop-2.8.0.tar.gz
You can now pat your back and proudly say that you know how to build your own distro for Hadoop.
How will this benefit you?
Following are the things you get when you build Hadoop in a 64-bit machine:
- Warning message not available anymore since Hadoop now has correct 64-bit libraries
- Can use more than 4GB RAM per node
Hope you liked this tutorial. Please add your comments below. Thanks!