Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the HDFS Location on Hadoop?

Tags:

java

hadoop

I am trying to run the WordCount example in Hadoop after following some online tutorials. However what's not clear to me as where does the file get copied from our local file system to HDFS when we execute the following command.

hadoop fs -copyFromLocal /host/tut/python-tutorial.pdf /usr/local/myhadoop-tmp/

When I executed the following command, I dont see my python-tutorial.pdf listed here on HDFS.

hadoop fs -ls

This is confusing me. I have already specified "myhadoop-tmp" directory in core-site.xml. I thought this directory will become HDFS directory for storing all the input files.

core-site.xml
=============
<property>
    <name>hadoop.tmp.dir</name>
    <value>/usr/local/myhadoop-tmp</value>
    <description>A base for other temporary directories.</description>
</property>

If this is not the case where is the HDFS located on my machine ? What configuration determines the HDFS directory and where does the input file go when we copy it from local file system to HDFS ?

like image 634
Nital Avatar asked Oct 17 '13 21:10

Nital


1 Answers

This is set in the dfs.datanode.data.dir property, which defaults to file://${hadoop.tmp.dir}/dfs/data (see details here).

However, in your case, the problem is that you are not using the full path within HDFS. Instead, do:

hadoop fs -ls /usr/local/myhadoop-tmp/

Note that, you also seem to be confusing the path within HDFS to the path in your local file system. Within HDFS, your file is in /usr/local/myhadoop-tmp/. In your local system (and given your configuration setting), it is under /usr/local/myhadoop-tmp/dfs/data/; in there, there's a directory structure and naming convention defined by HDFS, that is independent to whatever path in HDFS you decide to use. Also, it won't have the same name, since it is divided into blocks and each block is assigned a unique ID; the name of a block is something like blk_1073741826.

To conclude: the local path used by the datanode is NOT the same as the paths you use in HDFS. You can go into your local directory looking for files, but you should not do this, since you could mess up the HDFS metadata management. Just use the hadoop command-line tools to copy/move/read files within HDFS, using any logical path (in HDFS) that you wish to use. These paths within HDFS do not need to be tied to the paths you used in for your local datanode storage (there is no reason to or advantage of doing this).

like image 82
cabad Avatar answered Oct 11 '22 21:10

cabad