Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between Threading and Map-Reduce processing?

One of my collegue is arguing with me for introducing map-reduce concept in our application(text processing). His opinion is why we should not use threading concepts instead.We both are new to this map-reduce paradigm. I thought that using map-reduce concept helps the developer from the overhead of handling thread synchronisation,dead lock,shared data. Is there anything other than this for going to map-reduce concept rather than threading?

like image 704
udi Avatar asked Dec 11 '12 07:12

udi


2 Answers

You can find related paper for this, Comparing Fork/Join and MapReduce.

The paper compares the performance, scalability and programmability of three parallel paradigms: fork/join, MapReduce, and a hybrid approach.

What they find is basically that Java fork/join has low startup latency and scales well for small inputs (<5MB), but it cannot process larger inputs due to the size restrictions of shared-memory, single node architectures. On the other hand, MapReduce has significant startup latency (tens of seconds), but scales well for much larger inputs (>100MB) on a compute cluster.

Threading offers facilities to partition a task into several subtasks, in a recursive-looking fashion; more tiers, possibility of 'inter-fork' communication at this stage, much more traditional programming. Does not extend (at least in the paper) beyond a single machine. Great for taking advantage of your eight-core.

M-R only does one big split, with the mapped splits not talking between each other at all, and then reduces everything together. A single tier, no inter-split communication until reduce, and massively scalable. Great for taking advantage of your share of the cloud.

like image 152
user123 Avatar answered Oct 05 '22 10:10

user123


Map-reduce adds tons of overhead, but can work to coordinate a large fleet of machines for an "embarrassingly parallel" use case. Threading is only worth it if you have multiple cores and only a single host, but there are many frameworks which add layers of abstraction above raw threads (e.g. Concurrent, Akka) that are easier in general to work with.

like image 41
Judge Mental Avatar answered Oct 05 '22 11:10

Judge Mental