Apache Storm vs Apache Samza vs Apache Spark [closed]

Tags:

I have worked on Storm and Spark but Samza is quite new.

I do not understand why Samza was introduced when Storm is already there for real time processing. Spark provides in memory near real time processing and has other very useful components as graphx and mllib.

What are improvements that Samza brings and what further improvements are possible?

941

asked Mar 29 '17 18:03

Amit Kumar

1 Answers

This is a good summary of the differences and pros and cons.

I would just add that Samza, which actually isn't that new, brings a certain simplicity since it is opinionated on the use of Kafka as its backend, while others try to be more generic at the cost of simplicity. Samza is pioneered by the same people who created Kafka, who are also the same people behind the Kappa Architecture--primarily Jay Kreps formerly of LinkedIn. That's pretty cool.

Also, the programming models are totally different between realtime streams with Samza, microbatches in Spark Streaming (which isn't exactly the same as Spark), and spouts and bolts with tuples in Storm.

None of these are "better." It all depends on your use cases, the strengths of your team, how the APIs match up with your mental models, quality of support, etc.

You also forgot Apache Flink and Twitter's Heron, which they made because Storm started to fail them. Then again, very few need to operate at the scale of Twitter.

100

answered Oct 11 '22 23:10

Vidya

Related questions
                            
                                How to use a broadcast collection in a udf?
                            
                                How to group by common element in array?
                            
                                How to filter on partial match using sparklyr
                            
                                What is the difference between .sc and .scala file?
                            
                                How to print elements of particular RDD partition in Spark?
                            
                                Using Apache Spark with HDFS vs. other distributed storage
                            
                                How to use Spark Structured Streaming with Kafka Direct Stream?
                            
                                Spark 2.0: Redefining SparkSession params through GetOrCreate and NOT seeing changes in WebUI
                            
                                Spark: Transpose DataFrame Without Aggregating
                            
                                Reading multiple files from S3 in parallel (Spark, Java)
                            
                                How to convert RDD of dense vector into DataFrame in pyspark?
                            
                                ClassNotFoundException scala.runtime.LambdaDeserialize when spark-submit
                            
                                overwrite hive partitions using spark
                            
                                Spark cluster fails on bigger input, works well for small
                            
                                How to use Hadoop InputFormats In Apache Spark?
                            
                                Spark multiple contexts
                            
                                How to create a custom Transformer from a UDF?
                            
                                Can not infer schema for type: <type 'str'>
                            
                                How do I run a local Spark 2.x Session?
                            
                                Split Spark DataFrame based on condition

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Apache Storm vs Apache Samza vs Apache Spark [closed]

Tags:

apache-spark

apache-storm

apache-samza

Amit Kumar

People also ask

1 Answers

Vidya

Recent Activity

Donate For Us