How can I view how many blocks has a file been broken into, in a Hadoop file system?
We can use hadoop file system check command to know the blocks for the specific file.
A typical block size used by HDFS is 128 MB. Thus, an HDFS file is chopped up into 128 MB chunks, and if possible, each chunk will reside on a different DataNode.
hdfs fsck / -files -blocks -locations allows you to see only one file at a time. We use this tool to see if a huge parquet table is distributed nicely across nodes and disks, to check if data processing skew happens not because of data distribution flaws.
We can use hadoop file system check command to know the blocks for the specific file.
Below is the command:
hadoop fsck [path] [options]
To view the blocks for the specific file :
hadoop fsck /path/to/file -files -blocks
hadoop fsck filetopath
used the above commad in CDH 5. Got the below Error.
hadoop-hdfs/bin/hdfs: line 262: exec: : not found
Use the below command and it worked good
hdfs fsck filetopath
It is always a good idea to use hdfs instead of hadoop as 'hadoop' version is deprecated.
Here is the command with hdfs and to find the details on a file named 'test.txt' in the root, you would write
hdfs fsck /test.txt -files -blocks -locations
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With