Apache Apex looks similar to Apache Storm. <ul> <li>Users build application/topology as Directed Acyclic Graph (DAG) on both platforms. Apex uses operators/streams and Storm uses spouts/streams/bolts. </li> <li>They both process data in real time as opposed to batch processing. </li> <li>Both seem to have high throughput & low latency</li> </ul> So, at a glance, both look similar and I'm not quite getting the difference. Can someone please explain what are the key differences? In other words, when should I use one instead of the other?

There are fundamental differences in architecture which make each of the platform very different in terms of latency, scaling and state management. At the very basic level, <ol> <li>Apache Storm uses record acknowledgement to guarantee message delivery.</li> <li>Apache Apex uses checkpointing to guarantee message delivery. </li> </ol> You can learn more differences in the following blog which also includes other main stream processing platforms out there. https://databaseline.wordpress.com/2016/03/12/an-overview-of-apache-streaming-technologies/

How Apache Apex is different from Apache Storm?

2 Answers

There are fundamental differences in architecture which make each of the platform very different in terms of latency, scaling and state management.

At the very basic level,

Apache Storm uses record acknowledgement to guarantee message delivery.
Apache Apex uses checkpointing to guarantee message delivery.

You can learn more differences in the following blog which also includes other main stream processing platforms out there.

https://databaseline.wordpress.com/2016/03/12/an-overview-of-apache-streaming-technologies/

answered Oct 31 '22 15:10

ashwin111

Architecture and Features

+-------------------+---------------------------+---------------------+
|                   |           Storm           |         Apex        |
+-------------------+---------------------------+---------------------+
| Model             | Native Streaming          | Native Streaming    |
|                   | Micro batch (Trident      |                     |
+-------------------+---------------------------+---------------------+
| Language          | Java.                     | Java (Scala)        |
|                   | Ability to use non        |                     |
|                   | JVM languages support     |                     |
+-------------------+---------------------------+---------------------+
| API               | Compositional             | Compositional (DAG) |
|                   | Declarative (Trident)     | Declarative         |
|                   | Limited SQL               |                     |
|                   | support (Trident)         |                     |
+-------------------+---------------------------+---------------------+
| Locality          | Data Locality             | Advance Processing  |
+-------------------+---------------------------+---------------------+
| Latency           | Low                       | Very Low            |
|                   | High (Trident)            |                     |
+-------------------+---------------------------+---------------------+
| Throughput        | Limited in Ack mode       | Very high           |
+-------------------+---------------------------+---------------------+
| Scalibility       | Limited due to Ack        | Horizontal          |
+-------------------+---------------------------+---------------------+
| Partitioning      | Standard                  | Advance             |
|                   | Set parallelism at work,  | Parallel pipes,     |
|                   | executor and task level   | unifiers            |
+-------------------+---------------------------+---------------------+
| Connector Library | Limited (certification)   | Rich library of     |
|                   |                           | connectors in       |
|                   |                           | Apex Malhar         |
+-------------------+---------------------------+---------------------+

Operability

+------------+--------------------------+---------------------+
|            |           Storm          |         Apex        |
+------------+--------------------------+---------------------+
| State      | External store           | Checkpointing       |
| Management | Limited checkpointing    | Local checkpointing |
|            | Difficult to exploit     |                     |
|            | local state              |                     |
+------------+--------------------------+---------------------+
| Recovery   | Cumbersome API to        | Incremental         |
|            | store and retrieve state | (buffer server)     |
|            | Require user code        |                     |
+------------+--------------------------+---------------------+
| Processing | At least once            |                     |
| Semantic   | Exactly once require     | At least once       |
|            | user code and affect     | End to end          |
|            | latency                  |                     |
|            |                          | exactly once        |
+------------+--------------------------+---------------------+
| Back       | Watermark on queue       | Automatic           |
| Pressure   | size for spout and bolt  | Buffer server       |
|            | Does not scale           | memory and disk     |
+------------+--------------------------+---------------------+
| Elasticity | Through CLI only         | Yes w/ full user    |
|            |                          | control             |
+------------+--------------------------+---------------------+
| Dynamic    | No                       | Yes                 |
| topology   |                          |                     |
+------------+--------------------------+---------------------+
| Security   | Kerberos                 | Kerberos, RBAC,     |
|            |                          | LDAP                |
+------------+--------------------------+---------------------+
| Multi      | Mesos, RAS - memory,     | YARN                |
| Tenancy    | CPU, YARN                | full isolation      |
+------------+--------------------------+---------------------+
| DevOps     | REST API                 | REST API            |
| Tools      | Basic UI                 | DataTorrent RTS     |
+------------+--------------------------+---------------------+

Source: Webinar: Apache Apex (Next Gen Hadoop) vs. Storm - Comparison and Migration Outline https://www.youtube.com/watch?v=sPjyo2HfD_I

answered Oct 31 '22 16:10

brusli

Related questions
                            
                                How to call a particular method before killing a storm topology
                            
                                Will storm task state be transferred to new executor after rebalance?
                            
                                How to write logs to a file using Log4j and Storm Framework?
                            
                                Storm Bolt Database Connection
                            
                                What is Trident State in Storm?
                            
                                Azure Storm vs Azure Stream Analytics
                            
                                Submitting a topology to Storm
                            
                                The benefits of Flink Kafka Stream over Spark Kafka Stream? And Kafka Stream over Flink? [closed]
                            
                                (Twitter) Storm's Window On Aggregation
                            
                                Apache Storm java.nio.channels.ClosedChannelException: null
                            
                                Good use of storm?
                            
                                Logging from a storm bolt - where is it going?
                            
                                Execution flow of a storm program
                            
                                How to debug Apache Storm in Eclipse?
                            
                                Unable to run a storm-starter topology from the Storm tutorial
                            
                                Apache Flink vs Twitter Heron?
                            
                                Elastic Storm Topology / Storm-Hadoop Coexisting

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How Apache Apex is different from Apache Storm?

Tags:

apache-storm

bigdata

stream-processing

apache-apex

PradeepKumbhar

People also ask

2 Answers

ashwin111

brusli

Recent Activity

Donate For Us