I have ran a map only job with 674 mappers which hive took an has generated 674 .gz files I want to merge these files to aroung 30-35 files.have tried hive megre mapfilse property by not getting the merged output
Try using TEZ
execution engine and then hive.merge.tezfiles
. You might also want to specify the size as well.
set hive.execution.engine=tez; -- TEZ execution engine
set hive.merge.tezfiles=true; -- Notifying that merge step is required
set hive.merge.smallfiles.avgsize=128000000; --128MB
set hive.merge.size.per.task=128000000; -- 128MB
If you want to go with MR
engine then add following settings (I haven't tried it personally)
set hive.merge.mapredfiles=true; -- Notifying that merge step is required
set hive.merge.smallfiles.avgsize=128000000; --128MB
set hive.merge.size.per.task=128000000; -- 128MB
Above setting will spawn one more step to merge the files and approx size of each part file should be 128MB.
Reference:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With