Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to decrease the number of mapper in hive while the file is bigger than block size?

Tags:

hive

mapper

guys I have a table in hive which have more than 720 partitions,and in each partition there is more than 400 files and the file's average size is 1G.

Now I execute following SQL: insert overwrite table test_abc select * from DEFAULT.abc A WHERE A.P_HOUR ='2017042400' ;

this partition (P_HOUR ='2017042400' )have 409 files. When I submit this sql ,I got following output

INFO : Number of reduce tasks is set to 0 since there's no reduce operator INFO : number of splits:409

INFO : Submitting tokens for job: job_1482996444961_9384015

I google many doc to find how to decrease the number of mapper, lots of doc solved this problem when the file is small. I have tried the following set in beeline, but not work ---------------first time

set mapred.min.split.size =5000000000;
set mapred.max.split.size =10000000000;
set mapred.min.split.size.per.node=5000000000;
set mapred.min.split.size.per.rack=5000000000;

-----------------second time

set mapreduce.input.fileinputformat.split.minsize =5000000000;
set mapreduce.input.fileinputformat.split.maxsize=10000000000;
set mapreduce.input.fileinputformat.split.minsize.per.rack=5000000000;
set mapreduce.input.fileinputformat.split.minsize.per.node=5000000000;

my hadoop version is Hadoop 2.7.2 Compiled by root on 11 Jul 2016 10:58:45 hive version is Connected to: Apache Hive (version 1.3.0) Driver: Hive JDBC (version 1.3.0)

like image 726
lance Avatar asked May 23 '26 22:05

lance


2 Answers

In addition to the setup in your post

set hive.hadoop.supports.splittable.combineinputformat=true;

hive.hadoop.supports.splittable.combineinputformat
- Default Value: false
- Added In: Hive 0.6.0
Whether to combine small input files so that fewer mappers are spawned.

like image 132
David דודו Markovitz Avatar answered May 25 '26 22:05

David דודו Markovitz


MRv2 uses CombineInputFormat, while Tez uses grouped splits to determine the Mapper. If your execution engine is mr and you would like to reduce Mappers use:

mapreduce.input.fileinputformat.split.maxsize=xxxxx

If maxSplitSize is specified, then blocks on the same node are combined to a single split. Blocks that are left over are then combined with other blocks in the same rack. If maxSplitSize is not specified, then blocks from the same rack are combined in a single split; no attempt is made to create node-local splits. If the maxSplitSize is equal to the block size, then this class is similar to the default splitting behavior in Hadoop

This link can be helpful to control Mapper in Hive if your execution engine is mr

If your execution engine is tez and you would lile to control Mappers then use:

set tez.grouping.max-size = XXXXXX;

Here is a good read reference for the parallelism in Hive for tez execution engine,

like image 39
Sandeep Singh Avatar answered May 25 '26 23:05

Sandeep Singh



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!