Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Good use of storm?

Tags:

apache-storm

I've been reading about Storm and playing around with the examples from storm-starter.

I think I got the concept and it applies very well to many cases. I have a test project I want to do to learn more about this, but I'm wondering if Storm is really suited for this.

The conceptual problem I'm having is with the 'streaming' definition. It seems that Storms will work as a charm subscribing to a stream and processing it in real time, but I don't really have a stream, but rather a finite collection of data that I want to process.

I know there's hadoop for this, but I'm interested in the real time capabilities of Storm as well as other interesting points that Nathan, who wrote Storm, mentions in his talks.

So I was wondering, do people write Spouts that poll non streaming APIs and then diff the results maybe to emulate a stream?

The second important point is, it seems that Storm topologies never finish processing until interrupted, which again doesn't apply to my case. I would like my topology to know that once my finite list of source data is finished, the processing can terminate and a final result can be emitted.

So, does that all make sense in Storm terms or am I looking at the wrong thing? If so, what alternatives do you propose for this sort of real time parallel computing needs?

Thanks!

like image 747
palako Avatar asked Feb 21 '12 12:02

palako


1 Answers

Found the answer in the storm google group. Seems that DRCP topologies will emit a tuple with parameters that is received by DRCP spout as a stream and then will indicate back when the processing has finished (Using unique Id called Request ID).

In that same thread says that hadoop is probably best suited for these cases, unless the data is not big enough and can be processed entirely all the time.

like image 192
palako Avatar answered Oct 06 '22 23:10

palako