Default block size in Hadoop 2.x

Question

The default block size in Hadoop 2.x is 128MB. What is the problem with 64MB?

Ravindra babu · Accepted Answer

There are some reasons for increase in block size. It improves the performance if you are managing big Hadoop cluster of peta bytes of data.

if you are managing a cluster of 1 peta bytes, 64 MB block size results into 15+ million blocks, which is difficult for Namenode to manage efficiently.

Having a lot of blocks will also result in a lot of mappers during MapReduce execution.

Depending on your data requirement, you can fine tunedfs.blocksize

By properly setting your block size (64MB or 128 Mb or 256 MB or 512 MB) , you can acheive

Improvement in Namenode performance
Improvement in performance of Map reduce job since number of mappers is directly dependent on block size.

Refer to this link for more details.

ozw1z5rd · Answer

HDFS's block size are so huge to minimize seek time. The optimal block size depends on average file size, seek time and transfer rate.

The faster the disk, the bigger the data block but there is a limit.

To take advantage of data locality splits have the same size of data blocks, since we start a thread for each split, too big blocks reduce the parallelism. So the best is:

Keep seek time low. ( --> increase block size on fast disk )
Keep split not too low. ( --> decrease block size )
Take advantage of data locality. ( --> keep split size as close as block size )

128MB is the today good choice for today disk speed and size and computing performance.

Default block size in Hadoop 2.x

Tags:

hadoop

hadoop-yarn

hdfs

Pratik Garg

2 Answers

Ravindra babu

ozw1z5rd

Recent Activity

Donate For Us

Default block size in Hadoop 2.x

Tags:

hadoop

hadoop-yarn

hdfs

Pratik Garg

2 Answers

Ravindra babu

ozw1z5rd

Related questions

Recent Activity

Donate For Us