HDFS Storage Balancer Part 1

In this tutorial, we shall learn how to use HDFS Storage Balancer effectively.

We will also effectively understand all possible permutations and combinations that can be applied in the Hadoop-Balancer command.

HDFS allows us to store data using ‘Write Once’ paradigm where only appends are allowed.

In production, there exists a scenario, where there might be an unequal distribution of blocks across the cluster.

The probable reasons could be:

  1. Datanode Failure
  2. Network Lag
  3. Load Balancing issues


To re-order the blocks in the cluster such that the data is balanced in the cluster, it is recommended by the seasoned Hadoop admins to perform Balancer at least once in 10 days for a 24/7/365 uptime cluster or once in 5 weeks in a processing need uptime cluster.

Steps to perform balancer:

Step 1:  SSH to Namenode machine using Putty or any equivalent tool

Step 2 – 1 :Run the following command

hdfs balancer

Step 2 – 2 :  Another way by setting threshold. The threshold defines the percentage of cluster disk space utilized, compared to the nodes in the cluster.

hdfs balancer -threshold 30

Step 2 – 3: You can also set the Concurrent Block Moves during balancing to speed up the balancing process. This can be achieved by configuring hdfs-site.xml of datanodes with


Please note the default value for dfs.datanode.balance.max.concurrent.moves is 5.  Once the configuration is done, you can apply the configuration without restarting datanode service by typing the following command:

hdfs dfsadmin -reconfig datanode <dn_addr>:<ipc_port> start


dn_addr is the datanode IP address/hostname

ipc_port is the datanode's IPC port ( Default is 50010 )



Prashant Nair

Bigdata Consultant | Author | Corporate Trainer | Technical Reviewer Passionate about new trends and technologies. More Geeky. Contact me for training and consulting !!!

Leave a Reply

Your email address will not be published. Required fields are marked *