Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hadoop merge files

Tags:

hive

hiveql

I have ran a map only job with 674 mappers which hive took an has generated 674 .gz files I want to merge these files to aroung 30-35 files.have tried hive megre mapfilse property by not getting the merged output

like image 546
Raj Abhishek Avatar asked Oct 01 '16 18:10

Raj Abhishek


1 Answers

Try using TEZ execution engine and then hive.merge.tezfiles. You might also want to specify the size as well.

set hive.execution.engine=tez; -- TEZ execution engine
set hive.merge.tezfiles=true; -- Notifying that merge step is required
set hive.merge.smallfiles.avgsize=128000000; --128MB
set hive.merge.size.per.task=128000000; -- 128MB

If you want to go with MR engine then add following settings (I haven't tried it personally)

set hive.merge.mapredfiles=true; -- Notifying that merge step is required
set hive.merge.smallfiles.avgsize=128000000; --128MB
set hive.merge.size.per.task=128000000; -- 128MB

Above setting will spawn one more step to merge the files and approx size of each part file should be 128MB.

Reference:

  • Settings description
like image 64
Ambrish Avatar answered Oct 24 '22 16:10

Ambrish