Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Moving data to hdfs using copyFromLocal switch

Tags:

hadoop

hdfs

I don't know what's going on here but I am trying to copy a simple file from a directory in my local filesystem to the directory specified for hdfs.

In my hdfs-site.xml I have specified that the directory for hdfs will be /home/vaibhav/Hadoop/dataNodeHadoopData using the following properties -

<name>dfs.data.dir</name>
<value>/home/vaibhav/Hadoop/dataNodeHadoopData/</value>

and 

<name>dfs.name.dir</name>
<value>/home/vaibhav/Hadoop/dataNodeHadoopData/</value>

I am using the following command -

bin/hadoop dfs -copyFromLocal /home/vaibhav/ml-100k/u.data /home/vaibhav/Hadoop/dataNodeHadoopData

to copy the file u.data from it's local filesystem location to the directory that I specified as Hdfs directory. But when I do this, nothing happens - no error, nothing. And no file gets copied to the hdsf. Am I doing something wrong? Any permissions issue could be there?

Suggestions needed.

I am using pseudo distributed single node mode.

Also, on a related note, I want to ask that in my map reduce program I have set the configuration to point to the inputFilePath as /home/vaibhav/ml-100k/u.data. So would it not automatically copy the file from given location to hdfs ?

like image 955
Kumar Vaibhav Avatar asked Feb 05 '13 09:02

Kumar Vaibhav


People also ask

How transfer data from local to HDFS?

In order to copy a file from the local file system to HDFS, use Hadoop fs -put or hdfs dfs -put, on put command, specify the local-file-path where you wanted to copy from and then HDFS-file-path where you wanted to copy to. If the file already exists on HDFS, you will get an error message saying “File already exists”.

How copy file from server to HDFS?

Hadoop copyFromLocal command is used to copy the file from your local file system to the HDFS(Hadoop Distributed File System). copyFromLocal command has an optional switch –f which is used to replace the already existing file in the system, means it can be used to update that file.

What is the difference between put and copyFromLocal in Hadoop?

-copyFromLocal this command can copy only one source ie from local file system to destination file system. -put command can copy single and multiple sources from local file system to destination file system.


1 Answers

I believe dfs.data.dir and dfs.name.dir have to point to two different and existing directories. Furthermore make sure you have formatted the namenode FS after changing the directories in the configuration.

While copying to HDFS you're incorrectly specifying the target. The correct syntax for copying a local file to HDFS is:

bin/hadoop dfs -copyFromLocal <local_FS_filename> <target_on_HDFS>

Example:

bin/hadoop dfs -copyFromLocal /home/vaibhav/ml-100k/u.data my.data

This would create a file my.data in your user's home directory in HDFS. Before copying files to HDFS make sure, you master listing directory contents and directory creation first.

like image 140
harpun Avatar answered Nov 15 '22 17:11

harpun