How does Storm compare to Hadoop? Hadoop seems to be the defacto standard for open-source large scale batch processing, does Storm has any advantages over hadoop? or Are they completely different?

Why don't you tell your opinion. <ul> <li>http://www.infoq.com/news/2011/09/twitter-storm-real-time-hadoop/</li> <li>http://engineering.twitter.com/2011/08/storm-is-coming-more-details-and-plans.html</li> </ul> Twitter Storm has been touted as real time Hadoop. That is more a marketing take for easy consumption. They are superficially similar since both are distributed application solutions. Apart from the typical distributed architectural elements like master/slave, zookeeper based coordination, to me comparison falls off the cliff. Twitter is more like a pipline for processing data as it comes. The pipe is what connects various computing nodes that receive data, compute and deliver output. (There lingo is spouts and bolts) Extend this analogy to a complex pipeline wiring that can be re-engineered when required and you get Twitter Storm. In nut shell it processes data as it comes. There is no latency. Hadoop how ever is different in this respect primarily due to HDFS. It a solution geared to distributed storage and tolerance to outage of many scales (disks, machines, racks etc) M/R is built to leverage data localization on HDFS to distribute computational jobs. Together, they do not provide facility for real time data processing. But that is not always a requirement when you are looking through large data. (needle in the haystack analogy) In short, Twitter Storm is a distributed real time data processing solution. I don't think we should compare them. Twitter built it because it needed a facility to process small tweets but humungous number of them and in real time. See: HStreaming if you are compelled to compare it with some thing

Apache Storm compared to Hadoop

2 Answers

Why don't you tell your opinion.

http://www.infoq.com/news/2011/09/twitter-storm-real-time-hadoop/
http://engineering.twitter.com/2011/08/storm-is-coming-more-details-and-plans.html

Twitter Storm has been touted as real time Hadoop. That is more a marketing take for easy consumption.

They are superficially similar since both are distributed application solutions. Apart from the typical distributed architectural elements like master/slave, zookeeper based coordination, to me comparison falls off the cliff.

Twitter is more like a pipline for processing data as it comes. The pipe is what connects various computing nodes that receive data, compute and deliver output. (There lingo is spouts and bolts) Extend this analogy to a complex pipeline wiring that can be re-engineered when required and you get Twitter Storm.

In nut shell it processes data as it comes. There is no latency.

Hadoop how ever is different in this respect primarily due to HDFS. It a solution geared to distributed storage and tolerance to outage of many scales (disks, machines, racks etc)

M/R is built to leverage data localization on HDFS to distribute computational jobs. Together, they do not provide facility for real time data processing. But that is not always a requirement when you are looking through large data. (needle in the haystack analogy)

In short, Twitter Storm is a distributed real time data processing solution. I don't think we should compare them. Twitter built it because it needed a facility to process small tweets but humungous number of them and in real time.

See: HStreaming if you are compelled to compare it with some thing

167

answered Sep 29 '22 20:09

pyfunc

Basically, both of them are used for analyzing big data, but Storm is used for real time processing while Hadoop is used for batch processing.

This is a very good introduction to Storm that I found: Click here

answered Sep 29 '22 20:09

Dao Lam

Related questions
                            
                                Writing to HDFS could only be replicated to 0 nodes instead of minReplication (=1)
                            
                                How to rename a hive table without changing location?
                            
                                Best splittable compression for Hadoop input = bz2?
                            
                                How do I copy files from S3 to Amazon EMR HDFS?
                            
                                What should be hadoop.tmp.dir ?
                            
                                Change File Split size in Hadoop
                            
                                How to calculate Date difference in Hive
                            
                                Should I call ugi.checkTGTAndReloginFromKeytab() before every action on hadoop?
                            
                                How to make shark/spark clear the cache?
                            
                                hadoop fs -ls results in "no such file or directory"
                            
                                IllegalAccessError to guava's StopWatch from org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus
                            
                                Merge Spark output CSV files with a single header
                            
                                Advantages of using NullWritable in Hadoop
                            
                                LeaseExpiredException: No lease error on HDFS
                            
                                Hadoop safemode recovery - taking too long!
                            
                                How to delete files from the HDFS?
                            
                                How to restart yarn on AWS EMR
                            
                                HDFS_NAMENODE_USER, HDFS_DATANODE_USER & HDFS_SECONDARYNAMENODE_USER not defined
                            
                                MapReduce or Spark? [closed]
                            
                                Display the SQL definition of a hive view

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Apache Storm compared to Hadoop

Tags:

hadoop

streaming

apache-storm

18bytes

People also ask

2 Answers

pyfunc

Dao Lam

Recent Activity

Donate For Us