Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Simple Java Map/Reduce framework [closed]

Tags:

java

mapreduce

Can anyone point me at a simple, open-source Map/Reduce framework/API for Java? There doesn't seem to much evidence of such a thing existing, but someone else might know different.

The best I can find is, of course, Hadoop MapReduce, but that fails the "simple" criteria. I don't need the ability to run distributed jobs, just something to let me run map/reduce-style jobs on a multi-core machine, in a single JVM, using standard Java5-style concurrency.

It's not a hard thing to write oneself, but I'd rather not have to.

like image 779
skaffman Avatar asked Mar 10 '11 13:03

skaffman


People also ask

Is MapReduce a framework?

MapReduce is a framework using which we can write applications to process huge amounts of data, in parallel, on large clusters of commodity hardware in a reliable manner.

What is MapReduce explain the working of MapReduce with an example?

MapReduce is a programming framework that allows us to perform distributed and parallel processing on large data sets in a distributed environment. MapReduce consists of two distinct tasks – Map and Reduce. As the name MapReduce suggests, the reducer phase takes place after the mapper phase has been completed.

What are the four phases of MapReduce framework?

The whole process goes through various MapReduce phases of execution, namely, splitting, mapping, sorting and shuffling, and reducing.

Is MapReduce still used?

Google stopped using MapReduce as their primary big data processing model in 2014. Meanwhile, development on Apache Mahout had moved on to more capable and less disk-oriented mechanisms that incorporated the full map and reduce capabilities.


3 Answers

Have you check out Akka? While akka is really a distributed Actor model based concurrency framework, you can implement a lot of things simply with little code. It's just so easy to divide work into pieces with it, and it automatically takes full advantage of a multi-core machine, as well as being able to use multiple machines to process work. Unlike using threads, it feels more natural to me.

I have a Java map reduce example using akka. It's not the easiest map reduce example, since it makes use of futures; but it should give you a rough idea of what's involved. There are several major things that my map reduce example demonstrates:

  • How to divide the work.
  • How to assign the work: akka has a really simple messaging system was well as a work partioner, whose schedule you can configure. Once I learned how to use it, I couldn't stop. It's just so simple and flexible. I was using all four of my CPU cores in no time. This is really great for implementing services.
  • How to know when the work is done and the result is ready to process: This is actually the portion that may be the most difficult and confusing to understand unless you're already familiar with Futures. You don't need to use Futures, since there are other options. I just used them because I wanted something shorter for people to grok.

If you have any questions, StackOverflow actually has an awesome akka QA section.

like image 82
chaostheory Avatar answered Oct 23 '22 08:10

chaostheory


I think it is worth mentioning that these problems are history as of Java 8. An example:

int heaviestBlueBlock =
    blocks.filter(b -> b.getColor() == BLUE)
          .map(Block::getWeight)
          .reduce(0, Integer::max);

In other words: single-node MapReduce is available in Java 8.

For more details, see Brian Goetz's presentation about project lambda

like image 43
Lukas Eder Avatar answered Oct 23 '22 09:10

Lukas Eder


I use the following structure

int procs = Runtime.getRuntime().availableProcessors();
ExecutorService es = Executors.newFixedThreadPool(procs);

List<Future<TaskResult>> results = new ArrayList();
for(int i=0;i<tasks;i++)
    results.add(es.submit(new Task(i)));
for(Future<TaskResult> future:results)
    reduce(future);
like image 10
Peter Lawrey Avatar answered Oct 23 '22 10:10

Peter Lawrey