Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find the size of a HDFS file

Tags:

hadoop

hdfs

How to find the size of a HDFS file? What command should be used to find the size of any file in HDFS.

like image 213
priya Avatar asked Jul 20 '12 07:07

priya


People also ask

What is file size in HDFS?

Files in HDFS are broken into block-sized chunks called data blocks. These blocks are stored as independent units. The size of these HDFS data blocks is 128 MB by default.

What is file size and block size in HDFS?

A typical block size used by HDFS is 128 MB. Thus, an HDFS file is chopped up into 128 MB chunks, and if possible, each chunk will reside on a different DataNode.


1 Answers

I also find myself using hadoop fs -dus <path> a great deal. For example, if a directory on HDFS named "/user/frylock/input" contains 100 files and you need the total size for all of those files you could run:

hadoop fs -dus /user/frylock/input 

and you would get back the total size (in bytes) of all of the files in the "/user/frylock/input" directory.

Also, keep in mind that HDFS stores data redundantly so the actual physical storage used up by a file might be 3x or more than what is reported by hadoop fs -ls and hadoop fs -dus.

like image 167
Paul M Avatar answered Sep 16 '22 11:09

Paul M