The way to check a HDFS directory's size?

People also ask

What is the size of HDFS?

The default size of the HDFS data block is 128 MB. If blocks are small, there will be too many blocks in Hadoop HDFS and thus too much metadata to store. Managing such a huge number of blocks and metadata will create overhead and lead to traffic in a network.

How do I list a directory in HDFS?

The following arguments are available with hadoop ls command: Usage: hadoop fs -ls [-d] [-h] [-R] [-t] [-S] [-r] [-u] <args> Options: -d: Directories are listed as plain files. -h: Format file sizes in a human-readable fashion (eg 64.0m instead of 67108864). -R: Recursively list subdirectories encountered.

Prior to 0.20.203, and officially deprecated in 2.6.0:

hadoop fs -dus [directory]

Since ~~0.20.203~~ (dead link) 1.0.4 and still compatible through 2.6.0:

hdfs dfs -du [-s] [-h] URI [URI …]

You can also run hadoop fs -help for more info and specifics.

hadoop fs -du -s -h /path/to/dir displays a directory's size in readable form.

Extending to Matt D and others answers, the command can be till Apache Hadoop 3.0.0

hadoop fs -du [-s] [-h] [-v] [-x] URI [URI ...]

It displays sizes of files and directories contained in the given directory or the length of a file in case it's just a file.

Options:

The -s option will result in an aggregate summary of file lengths being displayed, rather than the individual files. Without the -s option, the calculation is done by going 1-level deep from the given path.

The -h option will format file sizes in a human-readable fashion (e.g 64.0m instead of 67108864)

The -v option will display the names of columns as a header line.

The -x option will exclude snapshots from the result calculation. Without the -x option (default), the result is always calculated from all INodes, including all snapshots under the given path.

`du` returns three columns with the following format:

 +-------------------------------------------------------------------+ 
 | size  |  disk_space_consumed_with_all_replicas  |  full_path_name | 
 +-------------------------------------------------------------------+

Example command:

hadoop fs -du /user/hadoop/dir1 \
    /user/hadoop/file1 \
    hdfs://nn.example.com/user/hadoop/dir1

Exit Code: Returns 0 on success and -1 on error.

source: Apache doc

With this you will get size in GB

hdfs dfs -du PATHTODIRECTORY | awk '/^[0-9]+/ { print int($1/(1024**3)) " [GB]\t" $2 }'

When trying to calculate the total of a particular group of files within a directory the -s option does not work (in Hadoop 2.7.1). For example:

Directory structure:

some_dir
├abc.txt    
├count1.txt 
├count2.txt 
└def.txt

Assume each file is 1 KB in size. You can summarize the entire directory with:

hdfs dfs -du -s some_dir
4096 some_dir

However, if I want the sum of all files containing "count" the command falls short.

hdfs dfs -du -s some_dir/count*
1024 some_dir/count1.txt
1024 some_dir/count2.txt

To get around this I usually pass the output through awk.

hdfs dfs -du some_dir/count* | awk '{ total+=$1 } END { print total }'
2048

To get the size of the directory hdfs dfs -du -s -h /$yourDirectoryName can be used. hdfs dfsadmin -report can be used to see a quick cluster level storage report.

hadoop version 2.3.33:

hadoop fs -dus  /path/to/dir  |   awk '{print $2/1024**3 " G"}'

enter image description here

Related questions
                            
                                How to delete and update a record in Hive
                            
                                What is Hive: Return Code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
                            
                                Is there any way to get the column name along with the output while execute any query in Hive?
                            
                                Buiding Hadoop with Eclipse / Maven - Missing artifact jdk.tools:jdk.tools:jar:1.6
                            
                                Where does Hive store files in HDFS?
                            
                                merge output files after reduce phase
                            
                                hadoop copy a local file system folder to HDFS
                            
                                Hadoop truncated/inconsistent counter name
                            
                                How to check if ZooKeeper is running or up from command prompt?
                            
                                When do reduce tasks start in Hadoop?
                            
                                How do I output the results of a HiveQL query to CSV?
                            
                                Large scale data processing Hbase vs Cassandra [closed]
                            
                                Container is running beyond memory limits
                            
                                Parquet vs ORC vs ORC with Snappy
                            
                                What is the difference between spark.sql.shuffle.partitions and spark.default.parallelism?
                            
                                How to know Hive and Hadoop versions from command prompt?
                            
                                Is there a .NET equivalent to Apache Hadoop? [closed]
                            
                                Avro vs. Parquet
                            
                                hadoop No FileSystem for scheme: file
                            
                                Can apache spark run without hadoop?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

The way to check a HDFS directory's size?

Tags:

directory

command-line

hadoop

hdfs

People also ask

`hadoop fs -du [-s] [-h] [-v] [-x] URI [URI ...]`

Options:

`du` returns three columns with the following format:

Example command:

Recent Activity

Donate For Us

The way to check a HDFS directory's size?

Tags:

directory

command-line

hadoop

hdfs

People also ask

hadoop fs -du [-s] [-h] [-v] [-x] URI [URI ...]

Options:

du returns three columns with the following format:

Example command:

Related questions

Recent Activity

Donate For Us

`hadoop fs -du [-s] [-h] [-v] [-x] URI [URI ...]`

`du` returns three columns with the following format: