Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Change block size of dfs file

Tags:

hadoop

My map is currently inefficient when parsing one particular set of files (a total of 2 TB). I'd like to change the block size of files in the Hadoop dfs (from 64MB to 128 MB). I can't find how to do it in the documentation for only one set of files and not the entire cluster.

Which command changes the block size when I upload? (Such as copying from local to dfs.)

like image 618
Sam Avatar asked Apr 19 '10 18:04

Sam


People also ask

Can you change the block size of HDFS files?

You can raise the HDFS block size from the default of 64 MB to 128 MB in order to optimize performance for most use cases. Boosting the block size allows EMC Isilon cluster nodes to read and write HDFS data in larger blocks.

What happens if we increase block size in Hadoop?

When the block size is small, seek overhead increases as small size of block means the data when divided into blocks will be distributed in more number of blocks and as more blocks are created, there will be more number of seeks to read/write data from/to the blocks.


3 Answers

For me, I had to slightly change Bkkbrad's answer to get it to work with my setup, in case anyone else finds this question later on. I've got Hadoop 0.20 running on Ubuntu 10.10:

hadoop fs -D dfs.block.size=134217728 -put local_name remote_location

The setting for me is not fs.local.block.size but rather dfs.block.size

like image 64
KWottrich Avatar answered Sep 20 '22 02:09

KWottrich


I change my answer! You just need to set the fs.local.block.size configuration setting appropriately when you use the command line.

hadoop fs -D fs.local.block.size=134217728 -put local_name remote_location

Original Answer

You can programatically specify the block size when you create a file with the Hadoop API. Unfortunately, you can't do this on the command line with the hadoop fs -put command. To do what you want, you'll have to write your own code to copy the local file to a remote location; it's not hard, just open a FileInputStream for the local file, create the remote OutputStream with FileSystem.create, and then use something like IOUtils.copy from Apache Commons IO to copy between the two streams.

like image 36
Bkkbrad Avatar answered Sep 19 '22 02:09

Bkkbrad


you can also modify your block size in your programs like this

Configuration conf = new Configuration() ;

conf.set( "dfs.block.size", 128*1024*1024) ;
like image 26
inuyasha1027 Avatar answered Sep 21 '22 02:09

inuyasha1027