HDFS copyFromLocal v/s put Command

“What’s the difference between copyFromLocal and Put command in HDFS CLI?”

A very common interview question, isn’t it?

Let’s try to figure out the notable difference between Put and copyFromLocal. Both commands have only one objective i.e. to load data in HDFS.

Let’s demonstrate the functionality now.

Variation 1: Loading data from local file system and storing the same in HDFS

  • Source File Location: file://home/hadoop/data1.txt
  • Destination File Location: hdfs:/data/data1.txt

Using copyFromLocal:

hdfs dfs -mkdir /data

hdfs dfs -copyFromLocal file://home/hadoop/data1.txt hdfs:/data/data1.txt

Result: Success !!!

Using put:

hdfs dfs -mkdir /data1

hdfs dfs -copyFromLocal file://home/hadoop/data1.txt hdfs:/data1/data1.txt

Result: Success !!!

 

Variation 2: Loading multiple files in HDFS

  • Source Files Location: /home/hadoop/mydata & /home/hadoop/mydata2
  • Destination Folder Location: hdfs://node2.mylabs.com/data

Using copyFromLocal:

echo “Hello” > data1

echo “Hi” > data2

hdfs dfs -mkdir /data_hadoop2

hdfs dfs -copyFromLocal data1 data2 /data_hadoop2

hdfs dfs -cat /data_hadoop2/data1

hdfs dfs -cat /data_hadoop2/data2

Result: Success !!!

Using put:

hdfs dfs -mkdir /put_data

hdfs dfs -put data1 data2 /put_data

Result: Success !!!

 

Variation 3:  Loading data from STDIN

  • Source Location – STDIN (Command line insert)
  • Destination Location – /put_data/stdin_data_1 ; /put_data/stdin_data_2

Using copyFromLocal:

hdfs dfs -copyFromLocal - /put_data/stdin_data_1

hdfs dfs -cat /put_data/stdin_data_1

Result: Success !!!

Using put:

hdfs dfs -put - /put_data/stdin_data_2

hdfs dfs -cat /put_data/stdin_data_2

Result: Success !!!

 

My Final Conclusion:

Since all my commands were working with ease, it was hard for me to figure out the actual difference just by running a set of commands. Also there was no noticeable difference between point-like performance and other stuff.

Out of curiosity, I started searching the source code of copyFromLocal and Put. This is what I got from Hadoop SVN:

 /**
   *  Copy local files to a remote filesystem
   */
  public static class Put extends CommandWithDestination {
    public static final String NAME = "put";
    public static final String USAGE = "[-f] [-p] <localsrc> ... <dst>";
    public static final String DESCRIPTION =
      "Copy files from the local file system " +
      "into fs. Copying fails if the file already " +
      "exists, unless the -f flag is given. Passing " +
      "-p preserves access and modification times, " +
      "ownership and the mode. Passing -f overwrites " +
      "the destination if it already exists.\n";

    @Override
    protected void processOptions(LinkedList<String> args) throws IOException {
      CommandFormat cf = new CommandFormat(1, Integer.MAX_VALUE, "f", "p");
      cf.parse(args);
      setOverwrite(cf.getOpt("f"));
      setPreserve(cf.getOpt("p"));
      getRemoteDestination(args);
      // should have a -r option
      setRecursive(true);
    }

    // commands operating on local paths have no need for glob expansion
    @Override
    protected List<PathData> expandArgument(String arg) throws IOException {
      List<PathData> items = new LinkedList<PathData>();
      try {
        items.add(new PathData(new URI(arg), getConf()));
      } catch (URISyntaxException e) {
        if (Path.WINDOWS) {
          // Unlike URI, PathData knows how to parse Windows drive-letter paths.
          items.add(new PathData(arg, getConf()));
        } else {
          throw new IOException("unexpected URISyntaxException", e);
        }
      }
      return items;
    }

    @Override
    protected void processArguments(LinkedList<PathData> args)
    throws IOException {
      // NOTE: this logic should be better, mimics previous implementation
      if (args.size() == 1 && args.get(0).toString().equals("-")) {
        copyStreamToTarget(System.in, getTargetPath(args.get(0)));
        return;
      }
      super.processArguments(args);
    }
  }
public static class CopyFromLocal extends Put { 
public static final String NAME = "copyFromLocal"; 
public static final String USAGE = Put.USAGE; 
public static final String DESCRIPTION = "Identical to the -put command."; 
}

As you can see from both codes of Put and CopyFromLocal, the fact observed is CopyFromLocal extends Put which means all functionality of Put is available on copyFromLocal with no special USPs of its own.

Literally there exists no difference between both the commands. 

 

Let me know your views for the same. Hope you find this post interesting!!!

Prashant Nair

Bigdata Consultant | Author | Corporate Trainer | Technical Reviewer Passionate about new trends and technologies. More Geeky. Contact me for training and consulting !!!

Leave a Reply

Your email address will not be published. Required fields are marked *