Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Default block size in Hadoop 2.x

The default block size in Hadoop 2.x is 128MB. What is the problem with 64MB?

like image 568
Pratik Garg Avatar asked Dec 21 '25 16:12

Pratik Garg


2 Answers

There are some reasons for increase in block size. It improves the performance if you are managing big Hadoop cluster of peta bytes of data.

if you are managing a cluster of 1 peta bytes, 64 MB block size results into 15+ million blocks, which is difficult for Namenode to manage efficiently.

Having a lot of blocks will also result in a lot of mappers during MapReduce execution.

Depending on your data requirement, you can fine tunedfs.blocksize

By properly setting your block size (64MB or 128 Mb or 256 MB or 512 MB) , you can acheive

  1. Improvement in Namenode performance
  2. Improvement in performance of Map reduce job since number of mappers is directly dependent on block size.

Refer to this link for more details.

like image 56
Ravindra babu Avatar answered Dec 24 '25 09:12

Ravindra babu


HDFS's block size are so huge to minimize seek time. The optimal block size depends on average file size, seek time and transfer rate.

The faster the disk, the bigger the data block but there is a limit.

To take advantage of data locality splits have the same size of data blocks, since we start a thread for each split, too big blocks reduce the parallelism. So the best is:

  1. Keep seek time low. ( --> increase block size on fast disk )
  2. Keep split not too low. ( --> decrease block size )
  3. Take advantage of data locality. ( --> keep split size as close as block size )

128MB is the today good choice for today disk speed and size and computing performance.

like image 36
ozw1z5rd Avatar answered Dec 24 '25 11:12

ozw1z5rd