Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HDFS space consumed: "hdfs dfs -du /" vs "hdfs dfsadmin -report"

Tags:

hadoop

hdfs

Which tool is the right one to measure HDFS space consumed?

When I sum up the output of "hdfs dfs -du /" I always get less amount of space consumed compared to "hdfs dfsadmin -report" ("DFS Used" line). Is there data that du does not take into account?

like image 610
facha Avatar asked Nov 04 '15 09:11

facha


People also ask

What is the difference between dfs and hdfs?

There IS a difference between the two, refer to the following figure from Apache's official documentation: As we can see here, the 'hdfs dfs' command is used very specifically for hadoop filesystem (hdfs) data operations while 'hadoop fs' covers a larger variety of data present on external platforms as well.

How do I know the size of my hdfs file system?

Use the hdfs du command to get the size of a directory in HDFS. -x to exclude snapshots from the result.

What is the output of hdfs dfs?

hdfs dfsadmin -report outputs a brief report on the overall HDFS filesystem. It's a useful command to quickly view how much disk is available, how many DataNodes are running, corrupted blocks etc. Note: This article explains the disk space calculations as seen by the HDFS.


1 Answers

Hadoop file systems provides a relabel storage, by putting a copy of data to several nodes. The number of copies is replication factor, usually it is greate then one.

Command hdfs dfs -du / shows space consume your data without replications.

Command hdfs dfsadmin -report (line DFS Used) shows actual disk usage, taking into account data replication. So it should be several times bigger when number getting from dfs -ud command.

like image 53
Alexander Kuznetsov Avatar answered Oct 12 '22 08:10

Alexander Kuznetsov