Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How Apache Apex is different from Apache Storm?

Apache Apex looks similar to Apache Storm.

  • Users build application/topology as Directed Acyclic Graph (DAG) on both platforms. Apex uses operators/streams and Storm uses spouts/streams/bolts.
  • They both process data in real time as opposed to batch processing.
  • Both seem to have high throughput & low latency

So, at a glance, both look similar and I'm not quite getting the difference. Can someone please explain what are the key differences? In other words, when should I use one instead of the other?

like image 673
PradeepKumbhar Avatar asked Apr 14 '16 07:04

PradeepKumbhar


People also ask

What is Apache Storm vs Spark?

Apache Storm and Spark are platforms for big data processing that work with real-time data streams. The core difference between the two technologies is in the way they handle data processing. Storm parallelizes task computation while Spark parallelizes data computations.

What is Apache Storm used for?

Apache Storm is a free and open source distributed realtime computation system. Apache Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Apache Storm is simple, can be used with any programming language, and is a lot of fun to use!

Is Apache storm still used?

We currently use Storm as our Twitter realtime data processing pipeline. We have Storm topologies for content filtering, geolocalisation and classification.

Which Apache technology is good for stream technology?

Apache Flink It can run stateful streaming applications at any scale and execute batch and stream processing without a fuss. With Flink, you can ingest streaming data from many sources, process them, and distribute them across various nodes.


2 Answers

There are fundamental differences in architecture which make each of the platform very different in terms of latency, scaling and state management.

At the very basic level,

  1. Apache Storm uses record acknowledgement to guarantee message delivery.
  2. Apache Apex uses checkpointing to guarantee message delivery.

You can learn more differences in the following blog which also includes other main stream processing platforms out there.

https://databaseline.wordpress.com/2016/03/12/an-overview-of-apache-streaming-technologies/

like image 79
ashwin111 Avatar answered Oct 31 '22 15:10

ashwin111


Architecture and Features

+-------------------+---------------------------+---------------------+
|                   |           Storm           |         Apex        |
+-------------------+---------------------------+---------------------+
| Model             | Native Streaming          | Native Streaming    |
|                   | Micro batch (Trident      |                     |
+-------------------+---------------------------+---------------------+
| Language          | Java.                     | Java (Scala)        |
|                   | Ability to use non        |                     |
|                   | JVM languages support     |                     |
+-------------------+---------------------------+---------------------+
| API               | Compositional             | Compositional (DAG) |
|                   | Declarative (Trident)     | Declarative         |
|                   | Limited SQL               |                     |
|                   | support (Trident)         |                     |
+-------------------+---------------------------+---------------------+
| Locality          | Data Locality             | Advance Processing  |
+-------------------+---------------------------+---------------------+
| Latency           | Low                       | Very Low            |
|                   | High (Trident)            |                     |
+-------------------+---------------------------+---------------------+
| Throughput        | Limited in Ack mode       | Very high           |
+-------------------+---------------------------+---------------------+
| Scalibility       | Limited due to Ack        | Horizontal          |
+-------------------+---------------------------+---------------------+
| Partitioning      | Standard                  | Advance             |
|                   | Set parallelism at work,  | Parallel pipes,     |
|                   | executor and task level   | unifiers            |
+-------------------+---------------------------+---------------------+
| Connector Library | Limited (certification)   | Rich library of     |
|                   |                           | connectors in       |
|                   |                           | Apex Malhar         |
+-------------------+---------------------------+---------------------+

Operability

+------------+--------------------------+---------------------+
|            |           Storm          |         Apex        |
+------------+--------------------------+---------------------+
| State      | External store           | Checkpointing       |
| Management | Limited checkpointing    | Local checkpointing |
|            | Difficult to exploit     |                     |
|            | local state              |                     |
+------------+--------------------------+---------------------+
| Recovery   | Cumbersome API to        | Incremental         |
|            | store and retrieve state | (buffer server)     |
|            | Require user code        |                     |
+------------+--------------------------+---------------------+
| Processing | At least once            |                     |
| Semantic   | Exactly once require     | At least once       |
|            | user code and affect     | End to end          |
|            | latency                  |                     |
|            |                          | exactly once        |
+------------+--------------------------+---------------------+
| Back       | Watermark on queue       | Automatic           |
| Pressure   | size for spout and bolt  | Buffer server       |
|            | Does not scale           | memory and disk     |
+------------+--------------------------+---------------------+
| Elasticity | Through CLI only         | Yes w/ full user    |
|            |                          | control             |
+------------+--------------------------+---------------------+
| Dynamic    | No                       | Yes                 |
| topology   |                          |                     |
+------------+--------------------------+---------------------+
| Security   | Kerberos                 | Kerberos, RBAC,     |
|            |                          | LDAP                |
+------------+--------------------------+---------------------+
| Multi      | Mesos, RAS - memory,     | YARN                |
| Tenancy    | CPU, YARN                | full isolation      |
+------------+--------------------------+---------------------+
| DevOps     | REST API                 | REST API            |
| Tools      | Basic UI                 | DataTorrent RTS     |
+------------+--------------------------+---------------------+

Source: Webinar: Apache Apex (Next Gen Hadoop) vs. Storm - Comparison and Migration Outline https://www.youtube.com/watch?v=sPjyo2HfD_I

like image 27
brusli Avatar answered Oct 31 '22 16:10

brusli