Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Change Block size of existing files in Hadoop

Tags:

hadoop

hdfs

Consider a hadoop cluster where the default block size is 64MB in hdfs-site.xml. However, later on the team decides to change this to 128MB. Here are my questions for the above scenario?

  1. Will this change require restart of the cluster or it will be taken up automatically and all new files will have the default block size of 128MB?
  2. What will happen to the existing files which have block size of 64M? Will the change in the configuration apply to existing files automatically? If it will be automatically done, then when will this be done - as soon as the change is done or when the cluster is started? If not automatically done, then how to manually do this block change?
like image 693
divinedragon Avatar asked Apr 13 '15 12:04

divinedragon


1 Answers

Will this change require restart of the cluster or it will be taken up automatically and all new files will have the default block size of 128MB

A restart of the cluster will be required for this property change to take effect.

What will happen to the existing files which have block size of 64M? Will the change in the configuration apply to existing files automatically?

Existing blocks will not change their block size.

If not automatically done, then how to manually do this block change?

To change the existing files you can use distcp. It will copy over the files with the new block size. However, you will have to manually delete the old files with the older block size. Here's a command that you can use

hadoop distcp -Ddfs.block.size=XX /path/to/old/files /path/to/new/files/with/larger/block/sizes.
like image 75
Abbas Gadhia Avatar answered Sep 27 '22 20:09

Abbas Gadhia