In Hadoop fs how to lookup the block size for a particular file?
I was primarily interested in a command line, something like:
hadoop fs ... hdfs://fs1.data/...
But it looks like that does not exist. Is there a Java solution?
Suppose we have a file of size 612 MB, and we are using the default block configuration (128 MB). Therefore five blocks are created, the first four blocks are 128 MB in size, and the fifth block is 100 MB in size (128*4+100=612).
In general, the seek time is 10ms and disk transfer rate is 100MB/s. To make the seek time 1% of the disk transfer rate, the block size should be 100MB. Hence to reduce the cost of disk seek time HDFS block default size is 64MB/128MB.
HDFS stores each file as blocks, and distribute it across the Hadoop cluster. The default size of a block in HDFS is 128 MB (Hadoop 2. x) and 64 MB (Hadoop 1. x) which is much larger as compared to the Linux system where the block size is 4KB.
The fsck
commands in the other answers list the blocks and allow you to see the number of blocks. However, to see the actual block size in bytes with no extra cruft do:
hadoop fs -stat %o /filename
Default block size is:
hdfs getconf -confKey dfs.blocksize
The units for the block size are not documented in the hadoop fs -stat
command, however, looking at the source line and the docs for the method it calls we can see it uses bytes and cannot report block sizes over about 9 exabytes.
The units for the hdfs getconf
command may not be bytes. It returns whatever string is being used for dfs.blocksize
in the configuration file. (This is seen in the source for the final function and its indirect caller)
Seems hadoop fs doesn't have options to do this.
But hadoop fsck could.
You can try this
$HADOOP_HOME/bin/hadoop fsck /path/to/file -files -blocks
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With