Simple Java Map/Reduce framework [closed]

Tags:

mapreduce

Can anyone point me at a simple, open-source Map/Reduce framework/API for Java? There doesn't seem to much evidence of such a thing existing, but someone else might know different.

The best I can find is, of course, Hadoop MapReduce, but that fails the "simple" criteria. I don't need the ability to run distributed jobs, just something to let me run map/reduce-style jobs on a multi-core machine, in a single JVM, using standard Java5-style concurrency.

It's not a hard thing to write oneself, but I'd rather not have to.

779

asked Mar 10 '11 13:03

skaffman

3 Answers

Have you check out Akka? While akka is really a distributed Actor model based concurrency framework, you can implement a lot of things simply with little code. It's just so easy to divide work into pieces with it, and it automatically takes full advantage of a multi-core machine, as well as being able to use multiple machines to process work. Unlike using threads, it feels more natural to me.

I have a Java map reduce example using akka. It's not the easiest map reduce example, since it makes use of futures; but it should give you a rough idea of what's involved. There are several major things that my map reduce example demonstrates:

How to divide the work.
How to assign the work: akka has a really simple messaging system was well as a work partioner, whose schedule you can configure. Once I learned how to use it, I couldn't stop. It's just so simple and flexible. I was using all four of my CPU cores in no time. This is really great for implementing services.
How to know when the work is done and the result is ready to process: This is actually the portion that may be the most difficult and confusing to understand unless you're already familiar with Futures. You don't need to use Futures, since there are other options. I just used them because I wanted something shorter for people to grok.

If you have any questions, StackOverflow actually has an awesome akka QA section.

answered Oct 23 '22 08:10

chaostheory

I think it is worth mentioning that these problems are history as of Java 8. An example:

int heaviestBlueBlock =
    blocks.filter(b -> b.getColor() == BLUE)
          .map(Block::getWeight)
          .reduce(0, Integer::max);

In other words: single-node MapReduce is available in Java 8.

For more details, see Brian Goetz's presentation about project lambda

answered Oct 23 '22 09:10

Lukas Eder

I use the following structure

int procs = Runtime.getRuntime().availableProcessors();
ExecutorService es = Executors.newFixedThreadPool(procs);

List<Future<TaskResult>> results = new ArrayList();
for(int i=0;i<tasks;i++)
    results.add(es.submit(new Task(i)));
for(Future<TaskResult> future:results)
    reduce(future);

answered Oct 23 '22 10:10

Peter Lawrey

Related questions
                            
                                JRockit JVM versus HotSpot JVM
                            
                                "Unable to locate tools.jar" when running ant [duplicate]
                            
                                Asynchronous programming best practices
                            
                                Android: How to configure FFMPEG latest version in android studio?
                            
                                How to show if a method may return null
                            
                                C++ Namespaces, comparison to Java packages
                            
                                Is there on install event in android?
                            
                                How to call a superclass method using Java reflection
                            
                                Alternatives to Apache HttpComponents?
                            
                                Can I update an existing Amazon S3 object?
                            
                                Growing ByteBuffer
                            
                                How does one intercept a request during the Jersey lifecycle?
                            
                                Where are the request method constants in the Servlet API?
                            
                                Variable column names using prepared statements
                            
                                what is `open` keyword for fields in Kotlin? [duplicate]
                            
                                Java Refuses to Start - Could not reserve enough space for object heap
                            
                                What feature corresponds to 'synchronized' in Java?
                            
                                Is there anything like VirtualEnv for Java?
                            
                                Best way of handling entities inheritance in Spring Data JPA
                            
                                Differences between jar and war in Spring Boot?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With