Where HDFS stores data

Tags:

I am trying to understand where hadoop stores data in HDFS. I refer to the config files viz: core-site.xml and hdfs-site.xml

The property that I have set is:

In core-site.xml:

<property>
    <name>hadoop.tmp.dir</name>
    <value>/hadoop/tmp</value>
</property>

In hdfs-site.xml:

<property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/hadoop/hdfs/namenode</value>
</property>

<property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/hadoop/hdfs/datanode</value>
</property>

With the above arrangement, like dfs.datanode.data.dir, the data blocks should be stored in this directory. Is this correct?

I referred to the apache hadoop link, and from that i see this:

core-default.xml: hadoop.tmp.dir --> A base for other temporary directories.
hdfs-default.xml dfs.datanode.data.dir --> Determines where on the local filesystem an DFS data node should store its blocks.

The default value for this property being -> file://${hadoop.tmp.dir}/dfs/data

Since I explicitly provided the value for dfs.datanode.data.dir (hdfs-site.xml), does it mean data would be stored in that location? If so, would dfs/data be added to the directory to ${dfs.datanode.data.dir}, specifically would it become -> /hadoop/hdfs/datanode/dfs/data?

However I didn't see this directory structure getting created.

One observation that I saw in my env:

I saw that after I run some MapReduce programs, this directory is created viz: /hadoop/tmp/dfs/data is getting created.

So, not sure if data gets stored in the directory as suggested by the property dfs.datanode.data.dir.

Does anyone have similar experience?

918

asked Mar 21 '14 17:03

CuriousMind

1 Answers

The data for hdfs files will be stored in the directory specified in dfs.datanode.data.dir, and the /dfs/data suffix that you see in the default value will not be appended.

If you edit hdfs-site.xml, you'll have to restart the DataNode service for the change to take effect. Also remember that changing the value will eliminate the ability of the DataNode service to supply blocks that were stored in the previous location.

Lastly, above you have your values specified with file:/... instead of file://.... File URI's do need that extra slash, so that might be causing these values to revert to the defaults.

answered Nov 22 '22 14:11

RickH

Related questions
                            
                                How can I use the AvroParquetWriter and write to S3 via the AmazonS3 api?
                            
                                How does parquet determine which encoding to use?
                            
                                CloudStore vs. HDFS
                            
                                Hadoop Spill failure
                            
                                why we need hadoop for hypertable
                            
                                Why does my streaming command fail for MapReduce basic program?
                            
                                Importing data from HDFS to Hive table
                            
                                Interpreting output from mahout clusterdumper
                            
                                How to uninstall Hadoop?
                            
                                What would be a good application for an enhanced version of MapReduce that shares information between Mappers?
                            
                                Updating a hadoop HDFS file
                            
                                what's the best practice for pooling Hive JDBC connections
                            
                                How do I use hadoop fs -getmerge to download .deflate files?
                            
                                Giraph Shortest Paths Example ClassNotFoundException
                            
                                Hadoop: Split metadata size exceeded 10000000
                            
                                Saving ordered dataframe in Spark
                            
                                What is meant by "HDFS lacks random read and write access"?
                            
                                How can PySpark be called in debug mode?
                            
                                Neural Network training in parallel, better to use Hadoop or a gpu?
                            
                                Spark: long delay between jobs

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Where HDFS stores data

Tags:

configuration

hadoop

hdfs

CuriousMind

People also ask

1 Answers

RickH

Recent Activity

Donate For Us