Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hive unable to manually set number of reducers

Tags:

I have the following hive query:

select count(distinct id) as total from mytable; 

which automatically spawns:
1408 Mappers
1 Reducer

I need to manually set the number of reducers and I have tried the following:

set mapred.reduce.tasks=50  set hive.exec.reducers.max=50 

but none of these settings seem to be honored. The query takes forever to run. Is there a way to manually set the reducers or maybe rewrite the query so it can result in more reducers? Thanks!

like image 753
magicalo Avatar asked Jan 06 '12 17:01

magicalo


People also ask

How do you set the number of reducers?

The number of reducers can be set in two ways as below: Using the command line: While running the MapReduce job, we have an option to set the number of reducers which can be specified by the controller mapred. reduce. tasks.

How do I change the number of reducers in Hadoop?

Ways To Change Number Of ReducersUpdate the driver program and set the setNumReduceTasks to the desired value on the job object. job. setNumReduceTasks(5); There is also a better ways to change the number of reducers, which is by using the mapred.

Can we set the number of reducers to 0?

Yes, we can set the Number of Reducer to zero. This means it is map only. The data is not sorted and directly stored in HDFS. If we want the output from mapper to be sorted ,we can use Identity reducer.


2 Answers

writing query in hive like this:

 SELECT COUNT(DISTINCT id) .... 

will always result in using only one reducer. You should:

  1. use this command to set desired number of reducers:

    set mapred.reduce.tasks=50

  2. rewrite query as following:

SELECT COUNT(*) FROM ( SELECT DISTINCT id FROM ... ) t;

This will result in 2 map+reduce jobs instead of one, but performance gain will be substantial.

like image 172
wlk Avatar answered Sep 19 '22 14:09

wlk


Number of reducers depends also on size of the input file

By default it is 1GB (1000000000 bytes). You could change that by setting the property hive.exec.reducers.bytes.per.reducer:

  1. either by changing hive-site.xml

    <property>    <name>hive.exec.reducers.bytes.per.reducer</name>    <value>1000000</value> </property> 
  2. or using set

    $ hive -e "set hive.exec.reducers.bytes.per.reducer=1000000"

like image 44
user1314742 Avatar answered Sep 23 '22 14:09

user1314742