Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Where HDFS stores data

I am trying to understand where hadoop stores data in HDFS. I refer to the config files viz: core-site.xml and hdfs-site.xml

The property that I have set is:

  • In core-site.xml:

    <property>
        <name>hadoop.tmp.dir</name>
        <value>/hadoop/tmp</value>
    </property>
    
  • In hdfs-site.xml:

    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:/hadoop/hdfs/namenode</value>
    </property>
    
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:/hadoop/hdfs/datanode</value>
    </property>
    

With the above arrangement, like dfs.datanode.data.dir, the data blocks should be stored in this directory. Is this correct?

I referred to the apache hadoop link, and from that i see this:

  • core-default.xml: hadoop.tmp.dir --> A base for other temporary directories.

  • hdfs-default.xml dfs.datanode.data.dir --> Determines where on the local filesystem an DFS data node should store its blocks.

    The default value for this property being -> file://${hadoop.tmp.dir}/dfs/data

Since I explicitly provided the value for dfs.datanode.data.dir (hdfs-site.xml), does it mean data would be stored in that location? If so, would dfs/data be added to the directory to ${dfs.datanode.data.dir}, specifically would it become -> /hadoop/hdfs/datanode/dfs/data?

However I didn't see this directory structure getting created.

One observation that I saw in my env:

I saw that after I run some MapReduce programs, this directory is created viz: /hadoop/tmp/dfs/data is getting created.

So, not sure if data gets stored in the directory as suggested by the property dfs.datanode.data.dir.

Does anyone have similar experience?

like image 918
CuriousMind Avatar asked Mar 21 '14 17:03

CuriousMind


People also ask

Where is the data stored in HDFS?

HDFS has a primary NameNode, which keeps track of where file data is kept in the cluster. HDFS also has multiple DataNodes on a commodity hardware cluster -- typically one per node in a cluster. The DataNodes are generally organized within the same rack in the data center.

Where does hadoop store metadata and application data?

The DataNode stores HDFS data in files in its local file system. The DataNode has no knowledge about HDFS files. It stores each block of HDFS data in a separate file in its local file system.

What is the default location of HDFS?

The default setting is: ${hadoop. tmp. dir}/dfs/data and note that the ${hadoop. tmp.

Does hadoop has its own storage?

Hadoop has a file system that is much like the one on your desktop computer, but it allows us to distribute files across many machines. HDFS organizes information into a consistent set of file blocks and storage blocks for each node. HDFS uses MapReduce to process and analyze data.


1 Answers

The data for hdfs files will be stored in the directory specified in dfs.datanode.data.dir, and the /dfs/data suffix that you see in the default value will not be appended.

If you edit hdfs-site.xml, you'll have to restart the DataNode service for the change to take effect. Also remember that changing the value will eliminate the ability of the DataNode service to supply blocks that were stored in the previous location.

Lastly, above you have your values specified with file:/... instead of file://.... File URI's do need that extra slash, so that might be causing these values to revert to the defaults.

like image 55
RickH Avatar answered Nov 22 '22 14:11

RickH