Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What would be a good application for an enhanced version of MapReduce that shares information between Mappers?

I am building an enhancement to the Spark framework (http://www.spark-project.org/). Spark is a project out of UC Berkeley that does MapReduce quickly in RAM. Spark is built in Scala.

The enhancement I'm building allows some data to be shared between the mappers while they are computing. This can be useful, for example, if each of the mappers is looking for an optimal solution, and they all want to share the current best solution (to prune out bad solutions early). The solution may be slightly out of date as it propagates, but this should still speed up the solution. In general, this is called the branch-and-bound approach.

We can share monotonically increasing numbers, but also we can share arrays, and dictionaries.

We are also looking at machine learning applications where the mappers describe local natural gradient information, and then a new best current optimal solution is shared among all nodes.

What are some other good real-world applications of this kind of enhancement? What kinds of real, useful applications might benefit from a Map Reduce computation with just a little bit of information-sharing between mappers. What applications use MapReduce or Hadoop right now but are just a little too slow because of the independence restriction of the Map phase?

The benefit can be to either speed up the map phase, or improve the solution.

like image 872
Joseph Perla Avatar asked Oct 10 '22 00:10

Joseph Perla


1 Answers

The enhancement I'm building allows some data to be shared between the mappers while they are computing.

Apache Giraph is based on Google Pregel which is based on BSP and is used for graph processing. In BSP, there is data sharing between the processes in the communication phase.

Giraph depends on Hadoop for implementation. In general there is no communication between the mappers in MapReduce, but in Giraph the mappers communicate with each other during the communication phase of BSP.

You might be also interested in Apache Hama which implements BSP and can be used for more than graph processing.

There might be some reason why mappers don't communicate in the MR. Have you considered these factors in your enhancement?

What are some other good real-world applications of this kind of enhancement?

Graph processing is one thing I can think of, similar to Giraph. Checkout the different use cases for BSP, some might be applicable for this kind of enhancement. I am also very interested what other have to say on this.

like image 164
Praveen Sripati Avatar answered Oct 13 '22 11:10

Praveen Sripati