Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to know the exact block size of a file on a Hadoop node?

Tags:

hadoop

hdfs

I have a 1 GB file that I've put on HDFS. So, it would be broken into blocks and sent to different nodes in the cluster.

Is there any command to identify the exact size of the block of the file on a particular node?

Thanks.

like image 455
Mayank Porwal Avatar asked Feb 17 '16 13:02

Mayank Porwal


Video Answer


2 Answers

You should use hdfs fsck command:

hdfs fsck /tmp/test.txt -files -blocks

This command will print information about all the blocks of which file consists:

/tmp/test.tar.gz 151937000 bytes, 2 block(s):  OK
0. BP-739546456-192.168.20.1-1455713910789:blk_1073742021_1197 len=134217728 Live_repl=3
1. BP-739546456-192.168.20.1-1455713910789:blk_1073742022_1198 len=17719272 Live_repl=3

As you can see here is shown (len field in every row) actual used capacities of blocks.

Also there are many another useful features of hdfs fsck which you can see at the official Hadoop documentation page.

like image 70
maxteneff Avatar answered Sep 22 '22 20:09

maxteneff


You can try:

hdfs getconf -confKey dfs.blocksize
like image 26
Karthik Avatar answered Sep 23 '22 20:09

Karthik