Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mapreduce: more reducers than mappers?

In my distributed systems course, we we began discussing the map reduce model of distributed computation. What are the benefits of having more reducers than mappers in map-reduce architectures?

Note: Google searching for this question provides conflicting opinions on this matter.

like image 939
Mike G Avatar asked Aug 02 '13 17:08

Mike G


1 Answers

Suppose your data size is small, then you don't need so many mappers running to process the input files in parallel.

However, if the <key,value> pairs generated by the mappers are large & diverse, then it makes sense to have more reducers because you can process more number of <key,value> pairs in parallel.

Lets consider a case where your mapper output has 10 keys, with 100 values associated with each key, so if you have 10 reducers, you can process all the keys in parallel.

Now suppose your mappers output 100 keys with 10 values in each key. Then having 100 reducers will process all your keys in parallel. (Of course there will be network costs involved with having 100 reducers running at once)

So based on the type of data that your mappers output, you can decide on the optimal number of reducers.

like image 161
Chaos Avatar answered Sep 24 '22 10:09

Chaos