In Hadoop fs how to lookup the block size for a particular file? I was primarily interested in a command line, something like: <pre class="prettyprint"><code>hadoop fs ... hdfs://fs1.data/... </code></pre> But it looks like that does not exist. Is there a Java solution?

The <code>fsck</code> commands in the other answers list the blocks and allow you to see the number of blocks. However, to see the actual block size in bytes with no extra cruft do: <pre class="prettyprint"><code>hadoop fs -stat %o /filename </code></pre> Default block size is: <pre class="prettyprint"><code>hdfs getconf -confKey dfs.blocksize </code></pre> <h3>Details about units</h3> The units for the block size are not documented in the <code>hadoop fs -stat</code> command, however, looking at the source line and the docs for the method it calls we can see it uses bytes and cannot report block sizes over about 9 exabytes. The units for the <code>hdfs getconf</code> command may not be bytes. It returns whatever string is being used for <code>dfs.blocksize</code> in the configuration file. (This is seen in the source for the final function and its indirect caller)

Seems hadoop fs doesn't have options to do this. But hadoop fsck could. You can try this <pre class="prettyprint"><code>$HADOOP_HOME/bin/hadoop fsck /path/to/file -files -blocks </code></pre>

Hadoop fs lookup for block size?

Tags:

hadoop

hdfs

In Hadoop fs how to lookup the block size for a particular file?

I was primarily interested in a command line, something like:

hadoop fs ... hdfs://fs1.data/...

But it looks like that does not exist. Is there a Java solution?

586

asked Dec 07 '11 06:12

Aleksandr Levchuk

2 Answers

The fsck commands in the other answers list the blocks and allow you to see the number of blocks. However, to see the actual block size in bytes with no extra cruft do:

hadoop fs -stat %o /filename

Default block size is:

hdfs getconf -confKey dfs.blocksize

Details about units

The units for the block size are not documented in the hadoop fs -stat command, however, looking at the source line and the docs for the method it calls we can see it uses bytes and cannot report block sizes over about 9 exabytes.

The units for the hdfs getconf command may not be bytes. It returns whatever string is being used for dfs.blocksize in the configuration file. (This is seen in the source for the final function and its indirect caller)

146

answered Oct 05 '22 01:10

Eponymous

Seems hadoop fs doesn't have options to do this.

But hadoop fsck could.

You can try this

$HADOOP_HOME/bin/hadoop fsck /path/to/file -files -blocks

answered Oct 05 '22 00:10

Chris Zheng

Related questions
                            
                                how to sort numerically in hadoop's shuffle/sort phase?
                            
                                Hadoop native libraries not found on OS/X
                            
                                Is there any Conditional IF like operator in Apache PIG?
                            
                                Python Connection to Hive
                            
                                How to read a .deflate file in hadoop
                            
                                Why we need Avro schema evolution
                            
                                Hive: Table creation with multi-files with multiple directories
                            
                                Hive throws: WstxParsingException: Illegal character entity: expansion character (code 0x8)
                            
                                NotSerializableException on anonymous class
                            
                                Why does "hadoop fs -mkdir" fail with Permission Denied?
                            
                                Sqoop Import --password-file function not working properly in sqoop 1.4.4
                            
                                Hadoop “Unable to load native-hadoop library for your platform” error on docker-spark?
                            
                                Hive enforces schema during read time?
                            
                                Hadoop 2.2.0 fails running start-dfs.sh with Error: JAVA_HOME is not set and could not be found
                            
                                Hadoop: How to unit test FileSystem
                            
                                Getting the following error "Datanode denied communication with namenode" while configuring hadoop 0.23.8
                            
                                Type mismatch in value from map: expected org.apache.hadoop.io.NullWritable, recieved org.apache.hadoop.io.Text
                            
                                Sampling a large distributed data set using pyspark / spark
                            
                                Hadoop: Cannot use Jps command
                            
                                Difference between Hadoop and Nosql [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With