Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Kafka Connect and Streams

So i very recently started reading about Kafka and I am a little confused about the difference between Kafka Connect and Kafka Streams. As per the definition Kafka Streams can collect data from Kafka topic, process it and push the output to another Kafka topic. While Kafka Connect move large data sets into and out of Kafka.

My question is why do we need Kafka Connect can pretty much read the data, process it and push it to a topic? Why one extra component ? It will be great if someone can explain the difference Thanks in advance :)

like image 964
Anuja Barve Avatar asked Jan 28 '26 13:01

Anuja Barve


2 Answers

Kafka Streams is a stream processing library for Apache Kafka. So, you can build streaming applications, read/write data from/to Kafka topics. It's a general purpose library.

On the flip side, Kafka Connect is a "data integration" framework. Usually you use Kafka Connect to import data from some data system like relational database into some Kafka topic. You can use the same framework for data export as well.

There are a lot of connectors for different data storage systems: HDFS, relational databases, ElasticSearch and more.

One of possible scenarios using both components (Kafka Connect, Kafka Streams) would be for example:

Continuously import data into Kafka topic from a relational database. Process that data using a Kafka Streams app which writes results into some output topic. Export data from that output topic into ElasticSearch using Kafka Connect.

[1] This blog post is a good overview of the both technologies playing together: https://www.confluent.io/blog/hello-world-kafka-connect-kafka-streams/

like image 114
codejitsu Avatar answered Jan 31 '26 22:01

codejitsu


Kafka connect : Since Kafka acting as data hub (standard), kafka has to connect to the entire data sources in the world and import data . And these all have keeping the same behavior, So if we have a common framework and standard for this purpose. It will be very useful and clean. That's why Kafka connect is here. Its just bridge. No data transformation will happen here. Because its not for that purpose.

Kafka Streams: It is specially made for data transformation. So all the computation related libraries will be available here.

like image 40
Namjith Aravind Avatar answered Jan 31 '26 21:01

Namjith Aravind