Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Twitter Storm v/s Apache Hadoop

It would be great if somebody can explain me the architectural differences between Twitter Storm and Apache Hadoop? I am looking out for some internals stuff beyond real time v/s batch processing. As both technologies are quiet similar in terms of writing a topology for Storm or map-reduce on Hadoop, in terms of task tracker/job tracker for Hadoop and the equivalent nimbus/supervisor for Storm, in terms of Hadoop partition and equivalent shuffling (random, field etc.) on Storm etc. (Am I correct if I say that Storm uses message queues internally for transporting data between spouts/bolt which is not exactly the case with Hadoop where in there are intermediate files created and hence an I/O involved.)

EDIT:

I have gone through the question Apache Storm compared to Hadoop but the accepted answer leaves me with a desire to know more than just the use case i.e. real time v/s batch processing.

like image 562
Yavar Avatar asked Aug 07 '13 09:08

Yavar


2 Answers

The main diffence is that Storm can do realtime processing of streams of Tupple s (incoming data) while Hadoop do batch processing with MapReduce jobs.

both of them process data in a distributed way, but with storm you can have live analitics while you will have to wait the mapreduce job to finish before playing with your results.

like image 63
tom Avatar answered Sep 20 '22 14:09

tom


Nathan Marz (Storm creator) is writing a book about Big Data where he discusses how to create big data systems with Hadoop, Storm and other technologies.

The book is discussing "The Lambda Architecture". Checkout this slide by Nathan Marz himself: Runaway complexity in Big Data... and a plan to stop it

like image 23
Chiron Avatar answered Sep 24 '22 14:09

Chiron