Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between hadoop fs -put and hadoop fs -copyFromLocal

Tags:

hadoop

hdfs

-copyFromLocal is similar to -put command, except that the source is restricted to a local file reference.

So basically, you can do with put, all that you do with -copyFromLocal, but not vice-versa.

Similarly,

-copyToLocal is similar to get command, except that the destination is restricted to a local file reference.

Hence, you can use get instead of -copyToLocal, but not the other way round.

Reference: Hadoop's documentation.

Update: For the latest as of Oct 2015, please see this answer below.


Let's make an example: If your HDFS contains the path: /tmp/dir/abc.txt And if your local disk also contains this path then the hdfs API won't know which one you mean, unless you specify a scheme like file:// or hdfs://. Maybe it picks the path you did not want to copy.

Therefore you have -copyFromLocal which is preventing you from accidentally copying the wrong file, by limiting the parameter you give to the local filesystem.

Put is for more advanced users who know which scheme to put in front.

It is always a bit confusing to new Hadoop users which filesystem they are currently in and where their files actually are.


Despite what is claimed by the documentation, as of now (Oct. 2015), both -copyFromLocal and -put are the same.

From the online help:

[cloudera@quickstart ~]$ hdfs dfs -help copyFromLocal 
-copyFromLocal [-f] [-p] [-l] <localsrc> ... <dst> :
  Identical to the -put command.

And this is confirmed by looking at the sources, where you can see that the CopyFromLocal class extends the Put class, but without adding any new behavior:

  public static class CopyFromLocal extends Put {
    public static final String NAME = "copyFromLocal";
    public static final String USAGE = Put.USAGE;
    public static final String DESCRIPTION = "Identical to the -put command.";
  }

  public static class CopyToLocal extends Get {
    public static final String NAME = "copyToLocal";
    public static final String USAGE = Get.USAGE;
    public static final String DESCRIPTION = "Identical to the -get command.";
  }

As you might notice it, this is exactly the same for get/copyToLocal.


  • both are the same except
  • -copyFromLocal is restricted to copy from local while -put can take file from any (other HDFS/local filesystem/..)