Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Viewing the number of blocks for a file in hadoop

Tags:

hadoop

hdfs

How can I view how many blocks has a file been broken into, in a Hadoop file system?

like image 678
London guy Avatar asked Jun 23 '12 09:06

London guy


People also ask

How do I know how many blocks I have in HDFS?

We can use hadoop file system check command to know the blocks for the specific file.

What is the size of each data block in the hadoop file system?

A typical block size used by HDFS is 128 MB. Thus, an HDFS file is chopped up into 128 MB chunks, and if possible, each chunk will reside on a different DataNode.

What is the HDFS command to check the blocks and the block locations?

hdfs fsck / -files -blocks -locations allows you to see only one file at a time. We use this tool to see if a huge parquet table is distributed nicely across nodes and disks, to check if data processing skew happens not because of data distribution flaws.


3 Answers

We can use hadoop file system check command to know the blocks for the specific file.

Below is the command:

hadoop fsck [path] [options] 

To view the blocks for the specific file :

hadoop fsck /path/to/file -files -blocks 
like image 101
Ramana Avatar answered Sep 22 '22 15:09

Ramana


hadoop fsck filetopath

used the above commad in CDH 5. Got the below Error.

hadoop-hdfs/bin/hdfs: line 262: exec: : not found

Use the below command and it worked good

hdfs fsck filetopath

like image 45
yoga Avatar answered Sep 22 '22 15:09

yoga


It is always a good idea to use hdfs instead of hadoop as 'hadoop' version is deprecated.

Here is the command with hdfs and to find the details on a file named 'test.txt' in the root, you would write

hdfs fsck /test.txt -files -blocks -locations

like image 33
user1795667 Avatar answered Sep 22 '22 15:09

user1795667