What is the computational complexity of the MapReduce overhead

Question

Given that the complexity of the map and reduce tasks are O(map)=f(n) and O(reduce)=g(n) has anybody taken the time to write down how the Map/Reduce intrinsic operations (sorting, shuffling, sending data, etc.) increases the computational complexity? What is the overhead of the Map/Reduce orchestration?

I know that this is a nonsense when your problem is big enough, just don't care about the inefficiencies, but for small problems that can run in a small machine or a couple of machines, should I go through the pain of designing parallel algorithms when I have a Map/Reduce implementation already at hand?

The Alchemist · Accepted Answer

For small problems "that can run in a small machine or a couple of machines," yes, you should rewrite them if performance is essential. Like others have pointed out, communication overhead is high.

I don't think anybody has done any complexity analysis on M/R operations because it's so heavily implementation-, machine-, and algorithm-specific. You should get so many variables just for, say, sorting:

O(n log n * s * (1/p)) where:
 - n is the number of items
 - s is the number of nodes
 - p is the ping time between nodes (assuming equal ping times between all nodes in the network)

Does that make sense? It gets really messy really quick. M/R is also a programming framework, not an algorithm in and of itself, and complexity analysis is usually reserved for algorithms.

The closest thing to what you're looking for may be complexity analysis of multi-threaded algorithms, which is much simpler.

What is the computational complexity of the MapReduce overhead

Tags:

big-o

hadoop

mapreduce

tonicebrian

1 Answers

The Alchemist

Recent Activity

Donate For Us

What is the computational complexity of the MapReduce overhead

Tags:

big-o

hadoop

mapreduce

tonicebrian

1 Answers

The Alchemist

Related questions

Recent Activity

Donate For Us