Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apache Beam over Apache Kafka Stream processing

What are the differences between Apache Beam and Apache Kafka with respect to Stream processing? I am trying to grasp the technical and programmatic differences as well.

Please help me understand by reporting from your experience.

like image 873
Stella Avatar asked Jun 14 '18 20:06

Stella


People also ask

What is the difference between Kafka and beam?

Beam is an API that uses an underlying stream processing engine like Flink, Storm, etc... in one unified way. Kafka is mainly an integration platform that offers a messaging system based on topics that standalone applications use to communicate with each other.

Can Kafka be used for stream processing?

Kafka Streams is a client library for processing and analyzing data stored in Kafka. It builds upon important stream processing concepts such as properly distinguishing between event time and processing time, windowing support, and simple yet efficient management and real-time querying of application state.

Why is Apache Beam not popular?

Disadvantages of Beam over Spark In terms of Apache Spark, the biggest functionality gap at the moment is probably a lack of support for streaming. Another example is that there is no easy way to run pipelines on a Spark cluster managed by YARN.

What is stream processing in Kafka?

A stream processing application is any program that makes use of the Kafka Streams library. It defines its computational logic through one or more processor topologies, where a processor topology is a graph of stream processors (nodes) that are connected by streams (edges).


2 Answers

Beam is an API that uses an underlying stream processing engine like Flink, Storm, etc... in one unified way.

Kafka is mainly an integration platform that offers a messaging system based on topics that standalone applications use to communicate with each other.

On top of this messaging system (and the Producer/Consummer API), Kafka offers an API to perform stream processing using messages as data and topics as input or output. Kafka Stream processing applications are standalone Java applications and act as regular Kafka Consummer and Producer (this is important to understand how these applications are managed and how workload is shared among stream processing application instances).

Shortly said, Kafka Stream processing applications are standalone Java applications that run outside the Kafka Cluster, feed from the Kafka Cluster and export results to the Kafka Cluster. With other stream processing platforms, stream processing applications run inside the cluster engine (and are managed by this engine), feed from somewhere else and export results to somewhere else.

One big difference between Kafka and Beam Stream API is that Beam makes the difference between bounded and unbounded data inside the data stream whereas Kafka does not make that difference. Thereby, handling bounded data with Kafka API has to be done manually using timed/sessionized windows to gather data.

like image 159
Guillaume Braibant Avatar answered Sep 20 '22 16:09

Guillaume Braibant


Beam is a programming API but not a system or library you can use. There are multiple Beam runners available that implement the Beam API.

Kafka is a stream processing platform and ships with Kafka Streams (aka Streams API), a Java stream processing library that is build to read data from Kafka topics and write results back to Kafka topics.

like image 30
Matthias J. Sax Avatar answered Sep 19 '22 16:09

Matthias J. Sax