Why Is a Block in HDFS So Large?

Tags:

Can somebody explain this calculation and give a lucid explanation?

A quick calculation shows that if the seek time is around 10 ms and the transfer rate is 100 MB/s, to make the seek time 1% of the transfer time, we need to make the block size around 100 MB. The default is actually 64 MB, although many HDFS installations use 128 MB blocks. This figure will continue to be revised upward as transfer speeds grow with new generations of disk drives.

629

asked Mar 12 '14 13:03

Kumar

2 Answers

A block will be stored as a contiguous piece of information on the disk, which means that the total time to read it completely is the time to locate it (seek time) + the time to read its content without doing any more seeks, i.e. sizeOfTheBlock / transferRate = transferTime.

If we keep the ratio seekTime / transferTime small (close to .01 in the text), it means we are reading data from the disk almost as fast as the physical limit imposed by the disk, with minimal time spent looking for information.

This is important since in map reduce jobs we are typically traversing (reading) the whole data set (represented by an HDFS file or folder or set of folders) and doing logic on it, so since we have to spend the full transferTime anyway to get all the data out of the disk, let's try to minimise the time spent doing seeks and read by big chunks, hence the large size of the data blocks.

In more traditional disk access software, we typically do not read the whole data set every time, so we'd rather spend more time doing plenty of seeks on smaller blocks rather than losing time transferring too much data that we won't need.

159

answered Sep 28 '22 12:09

Svend

Since 100mb is divided into 10 blocks you gotta do 10 seeks and transfer rate is (10/100)mb/s for each file. (10ms*10) + (10/100mb/s)*10 = 1.1 sec. which is greater than 1.01 anyway.

answered Sep 28 '22 12:09

Shanker Lolakapuri

Related questions
                            
                                MEX compile error: unknown type name 'char16_t'
                            
                                Using your own XMPP server for android chat app (Smack API)
                            
                                cppcheck std.cfg not found error when std.cfg file is available
                            
                                Angular js bootstrap button disable
                            
                                git: How do I add a custom merge strategy?
                            
                                Why does Platform.runLater not check if it currently is on the JavaFX thread?
                            
                                Node.js GET Request ETIMEDOUT & ESOCKETTIMEDOUT
                            
                                "Invalid JSON primitive" error when converting JSON file
                            
                                What is ASP.NET vNext?
                            
                                How to set a breakpoint in a setter method in IntelliJ IDEA that is generated with Lombok?
                            
                                SSH then change Shell
                            
                                Using Python pudb debugger with pytest

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With