I have the following hive query: <pre class="prettyprint"><code>select count(distinct id) as total from mytable; </code></pre> which automatically spawns: 1408 Mappers 1 Reducer I need to manually set the number of reducers and I have tried the following: <pre class="prettyprint"><code>set mapred.reduce.tasks=50 set hive.exec.reducers.max=50 </code></pre> but none of these settings seem to be honored. The query takes forever to run. Is there a way to manually set the reducers or maybe rewrite the query so it can result in more reducers? Thanks!

writing query in hive like this: <pre class="prettyprint"><code> SELECT COUNT(DISTINCT id) .... </code></pre> will always result in using only one reducer. You should: <ol> <li> use this command to set desired number of reducers: set mapred.reduce.tasks=50 </li> <li>rewrite query as following:</li> </ol> <blockquote> SELECT COUNT(*) FROM ( SELECT DISTINCT id FROM ... ) t; </blockquote> This will result in 2 map+reduce jobs instead of one, but performance gain will be substantial.

Number of reducers depends also on size of the input file By default it is 1GB (1000000000 bytes). You could change that by setting the property hive.exec.reducers.bytes.per.reducer: <ol> <li> either by changing hive-site.xml <pre class="prettyprint"><code><property> <name>hive.exec.reducers.bytes.per.reducer</name> <value>1000000</value> </property> </code></pre> </li> <li> or using set <code>$ hive -e "set hive.exec.reducers.bytes.per.reducer=1000000"</code> </li> </ol>

Hive unable to manually set number of reducers

Tags:

I have the following hive query:

select count(distinct id) as total from mytable;

which automatically spawns:
1408 Mappers
1 Reducer

I need to manually set the number of reducers and I have tried the following:

set mapred.reduce.tasks=50  set hive.exec.reducers.max=50

but none of these settings seem to be honored. The query takes forever to run. Is there a way to manually set the reducers or maybe rewrite the query so it can result in more reducers? Thanks!

753

asked Jan 06 '12 17:01

magicalo

2 Answers

writing query in hive like this:

 SELECT COUNT(DISTINCT id) ....

will always result in using only one reducer. You should:

use this command to set desired number of reducers:

set mapred.reduce.tasks=50
rewrite query as following:

SELECT COUNT(*) FROM ( SELECT DISTINCT id FROM ... ) t;

This will result in 2 map+reduce jobs instead of one, but performance gain will be substantial.

172

answered Sep 19 '22 14:09

wlk

Number of reducers depends also on size of the input file

By default it is 1GB (1000000000 bytes). You could change that by setting the property hive.exec.reducers.bytes.per.reducer:

either by changing hive-site.xml

<property>    <name>hive.exec.reducers.bytes.per.reducer</name>    <value>1000000</value> </property>

or using set

$ hive -e "set hive.exec.reducers.bytes.per.reducer=1000000"

answered Sep 23 '22 14:09

user1314742

Related questions
                            
                                Finding the position of a value in PostgreSQL arrays
                            
                                Python : clear a log file
                            
                                Create a temp file with a specific extension using php
                            
                                Android: Converting a Bitmap to a Monochrome Bitmap (1 Bit per Pixel)
                            
                                Aligning multiple images horizontally in the center of a div
                            
                                Web Audio to visualize and interact with waveforms
                            
                                How to store a dictionary in a Django database model's field
                            
                                Placing every character on a new line
                            
                                How can I select an element which does not contain a certain child element?
                            
                                Distance between two locations - Google Maps
                            
                                Does Java's ProxySelector not work with automatic proxy configuration scripts?
                            
                                Why does awk "not in" array work just like awk "in" array?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With