Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hadoop fs lookup for block size?

Tags:

hadoop

hdfs

In Hadoop fs how to lookup the block size for a particular file?

I was primarily interested in a command line, something like:

hadoop fs ... hdfs://fs1.data/...

But it looks like that does not exist. Is there a Java solution?

like image 586
Aleksandr Levchuk Avatar asked Dec 07 '11 06:12

Aleksandr Levchuk


People also ask

How does Hadoop calculate block size?

Suppose we have a file of size 612 MB, and we are using the default block configuration (128 MB). Therefore five blocks are created, the first four blocks are 128 MB in size, and the fifth block is 100 MB in size (128*4+100=612).

Why block size is 64MB in Hadoop?

In general, the seek time is 10ms and disk transfer rate is 100MB/s. To make the seek time 1% of the disk transfer rate, the block size should be 100MB. Hence to reduce the cost of disk seek time HDFS block default size is 64MB/128MB.

What is the block size of Hadoop cluster?

HDFS stores each file as blocks, and distribute it across the Hadoop cluster. The default size of a block in HDFS is 128 MB (Hadoop 2. x) and 64 MB (Hadoop 1. x) which is much larger as compared to the Linux system where the block size is 4KB.


2 Answers

The fsck commands in the other answers list the blocks and allow you to see the number of blocks. However, to see the actual block size in bytes with no extra cruft do:

hadoop fs -stat %o /filename

Default block size is:

hdfs getconf -confKey dfs.blocksize

Details about units

The units for the block size are not documented in the hadoop fs -stat command, however, looking at the source line and the docs for the method it calls we can see it uses bytes and cannot report block sizes over about 9 exabytes.

The units for the hdfs getconf command may not be bytes. It returns whatever string is being used for dfs.blocksize in the configuration file. (This is seen in the source for the final function and its indirect caller)

like image 146
Eponymous Avatar answered Oct 05 '22 01:10

Eponymous


Seems hadoop fs doesn't have options to do this.

But hadoop fsck could.

You can try this

$HADOOP_HOME/bin/hadoop fsck /path/to/file -files -blocks
like image 44
Chris Zheng Avatar answered Oct 05 '22 00:10

Chris Zheng