Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a canonical problem that provably can't be aided with map/reduce?

I'm trying to understand the boundaries of hadoop and map/reduce and it would help to know a non-trivial problem, or class of problems, that we know map/reduce can't assist in.

It certainly would be interesting if changing one factor of the problem would allow simplification from map/reduce.

Thank you

like image 766
Steven Noble Avatar asked Aug 05 '10 05:08

Steven Noble


People also ask

Why is MapReduce not suitable for real time processing?

MapReduce is not able to execute recursive or iterative jobs inherently [12]. Total batch behavior is another problem. All of the input must be ready before the job starts and this prevents MapReduce from online and stream processing use cases.

Why is MapReduce not suitable for iterative algorithms?

MapReduce uses coarse-grained tasks to do its work, which are too heavyweight for iterative algorithms. Another problem is that MapReduce has no awareness of the total pipeline of Map plus Reduce steps, so it can't cache intermediate data in memory for faster performance.

What types of problems is MapReduce most suitable to solve?

Anything that involves doing operations on a large set of data, where the problem can be broken down into smaller independent sub-problems who's results can then be aggregated to produce the answer to the larger problem. A trivial example would be calculating the sum of a huge set of numbers.


1 Answers

Two things come to mind:

  1. Anything that requires real-time / interactive / low latency response times. There is a fixed cost incurred for any job submitted to Hadoop.

  2. Any problem that is not embarrassingly parallel. Hadoop can handle a lot of problems that require some simple interdependency between data, since records are joined during the reduce phase. However, certain graph processing and machine learning algorithms are difficult to write in Hadoop because there are too many operations that are dependent on one another. Some machine learning algorithms require very low latency, random access to a large set of data, which Hadoop does not provide out of the box.

like image 126
bajafresh4life Avatar answered Nov 15 '22 08:11

bajafresh4life