Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apache Apex vs Apache Flink

As both are streaming frameworks which processes event at a time, What are the core architectural differences between these two technologies/streaming framework?

Also, what are some particular use cases where one is more appropriate than the other?

like image 876
Biplob Biswas Avatar asked Aug 24 '17 12:08

Biplob Biswas


People also ask

Does Netflix use Flink?

Netflix engineers recently published how they built Studio Search, using Apache Kafka streams, an Apache Flink-based Data Mesh process, and an Elasticsearch sink to manage the index.

What is the difference between Apache spark and Apache Flink?

Flink offers native streaming, while Spark uses micro batches to emulate streaming. That means Flink processes each event in real-time and provides very low latency. Spark, by using micro-batching, can only deliver near real-time processing. For many use cases, Spark provides acceptable performance levels.

Is Flink better than spark?

Apache Spark has high latency as compared to Apache Flink. Overall performance of Apache Flink is excellent as compared to any other data processing system. Apache Flink uses native closed loop iterations operators which makes machine learning and graph processing more faster.

Is Apache Flink good?

Apache Flink is an excellent choice to develop and run many different types of applications due to its extensive features set. Flink's features include support for stream and batch processing, sophisticated state management, event-time processing semantics, and exactly-once consistency guarantees for state.


1 Answers

As you mentioned both are streaming platform which to in memory computation in real time. But there are some architectural differences when you take a closer look.

  1. Apex is yarn native architecture, it fully utilises yarn for scheduling, security & multi-tenancy where as Flink integrates with yarn. Apex can do resource allocation at operator (container) level with yarn.
  2. Partitioning: Apex supports several sophisticated stream partitioning schemes and also allows controlling operator locality & stream locality. Flink supports simple hash partitions and custom partitions.
  3. Apex allows dynamic changes to topology without having to take down the application. Apex allows the application to be updated at runtime so you can add and remove operators, update properties of operators, or automatically scale the application at runtime. Apache Flink does not support any of these capabilities.
  4. Buffer Server: There is a message bus called buffer server between operators. Subscribers can connect to buffer server and fetch data from particular offsets. This is window aware, and holds data as long as no subscriber needs it.
  5. Fault tolerance: Apex has incremental recovery model, on failure it can only part of topology can be restarted no need to go back to source, where in flink it goes back to source.
  6. Apex has high level api as well as low level api. Flink only has high level api.
  7. Apex has a library called Apache Malhar which has vast variety of well tested connectors and processing operators which can be reused easily.
  8. Lastly Apex is more focused on productizing big data applications so has many features which will help in easy development and maintenance of applications.

Note: I am a committer to Apache Apex, so I might sound biased to Apex :)

like image 107
priya Avatar answered Oct 29 '22 19:10

priya