Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hadoop slowstart configuration

Tags:

hadoop

What's an ideal value for "mapred.reduce.slowstart.completed.maps" for a Hadoop job? What are the rules to follow to set it appropriately?

Thanks!

like image 660
user414585 Avatar asked Jul 06 '12 19:07

user414585


1 Answers

It depends on a number of characteristics of your job, cluster and utilization:

  1. How many map slots will your job require vs maximum map capacity: If you have a job that spawns 1000's of map tasks, but only have 10 map slots in total (an extreme case to demonstrate a point), then starting your reducers early could deprive over reduce tasks from executing. In this case i would set your slowstart to a large value (0.999 or 1.0). This is also true if your mappers take an age to complete - let someone else use the reducers

  2. If your cluster is relatively lightly loaded (there isn't contention for the reducer slots) and your mappers output a good volume of data, then a low value for slowstart will assist in getting your job to finish earlier (while other map tasks execute, get the map output data moved to the reducers).

There are probably more

like image 177
Chris White Avatar answered Oct 09 '22 00:10

Chris White