Mapreduce: more reducers than mappers?

Question

In my distributed systems course, we we began discussing the map reduce model of distributed computation. What are the benefits of having more reducers than mappers in map-reduce architectures?

Note: Google searching for this question provides conflicting opinions on this matter.

Chaos · Accepted Answer

Suppose your data size is small, then you don't need so many mappers running to process the input files in parallel.

However, if the <key,value> pairs generated by the mappers are large & diverse, then it makes sense to have more reducers because you can process more number of <key,value> pairs in parallel.

Lets consider a case where your mapper output has 10 keys, with 100 values associated with each key, so if you have 10 reducers, you can process all the keys in parallel.

Now suppose your mappers output 100 keys with 10 values in each key. Then having 100 reducers will process all your keys in parallel. (Of course there will be network costs involved with having 100 reducers running at once)

So based on the type of data that your mappers output, you can decide on the optimal number of reducers.

Mapreduce: more reducers than mappers?

Tags:

hadoop

mapreduce

Mike G

1 Answers

Chaos

Recent Activity

Donate For Us

Mapreduce: more reducers than mappers?

Tags:

hadoop

mapreduce

Mike G

1 Answers

Chaos

Related questions

Recent Activity

Donate For Us