Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hadoop, Mahout real-time processing alternative

I intended to use hadoop as "computation cluster" in my project. However then I read that Hadoop is not inteded for real-time systems because of overhead connected with start of a job. I'm looking for solution which could be use this way - jobs which could can be easly scaled into multiple machines but which does not require much input data. What is more I want to use machine learning jobs e.g. using created before neural network in real-time.

What libraries/technologies I can use for this purposes?

like image 421
mmatloka Avatar asked Oct 01 '11 10:10

mmatloka


1 Answers

You are right, Hadoop is designed for batch-type processing.

Reading the question, I though about the Storm framework very recently open sourced by Twitter, which can be considered as "Hadoop for real-time processing".

Storm makes it easy to write and scale complex realtime computations on a cluster of computers, doing for realtime processing what Hadoop did for batch processing. Storm guarantees that every message will be processed. And it's fast — you can process millions of messages per second with a small cluster. Best of all, you can write Storm topologies using any programming language.

(from: InfoQ post)

However, I have not worked with it yet, so I really cannot say much about it in practice.

Twitter Engineering Blog Post: http://engineering.twitter.com/2011/08/storm-is-coming-more-details-and-plans.html
Github: https://github.com/nathanmarz/storm

like image 78
dmeister Avatar answered Sep 20 '22 12:09

dmeister