Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to connect to multiple clusters in a single Kafka Streams application?

In the Kafka Streams Developer Guide it says:

Kafka Streams applications can only communicate with a single Kafka cluster specified by this config value. Future versions of Kafka Streams will support connecting to different Kafka clusters for reading input streams and writing output streams.

Does this mean that my whole application can only connect to a single Kafka Cluster or each instance of KafkaStreams can only connect to a single cluster?

Could I create multiple KafkaStreams instances with different properties that connect to different clusters?

like image 213
mixiul__ Avatar asked Aug 23 '17 19:08

mixiul__


People also ask

Can a Kafka consumer read from multiple clusters?

A consumer group, such as a Kafka Streams-based application, can process data from a single Kafka cluster only. Therefore, multi-topic subscriptions or load balancing across the consumers in a consumer group are possible only within a single Kafka cluster.

Can Kafka have multiple clusters?

A Kafka cluster is a cluster which is composed of multiple brokers with their respective partitions. A multiple Kafka cluster means connecting two or more clusters to ease the work of producers and consumers.

Is Kafka Streams multithreaded?

Threading ModelKafka Streams allows the user to configure the number of threads that the library can use to parallelize processing within an application instance. Each thread can execute one or more stream tasks with their processor topologies independently.


Video Answer


2 Answers

Just to add to the excellent answer from @Matthias J. Sax.

Does this mean that my whole application can only connect to a single Kafka Cluster or each instance of KafkaStreams can only connect to a single cluster?

I think there are two questions here.

It depends on the definition of "my whole application", i.e. it could simply be a single KafkaStreams instance or multiple instances on a single JVM or perhaps multiple KafkaStreams instances on a single JVM in a Docker container that is executed as a pod. Whatever it is, you can find "my whole application" a bit too broad and not very precise.

The point is that there is no way you can create a KafkaStreams instance that could talk to multiple Kafka clusters (since the configuration is through properties that are key-value pairs in a map) and so just by this you could answer your own question, couldn't you?


Being unable to use two or more Kafka clusters in a Kafka Streams application is one of the differences between Kafka Streams and Spark Structured Streaming (with the latter being able to use as many Kafka clusters as you want and so you could build pipelines between different Kafka clusters).

like image 87
Jacek Laskowski Avatar answered Nov 23 '22 08:11

Jacek Laskowski


It means that a single application can only connect to one cluster.

  • You cannot read a topic from cluster A and write the result of your computation to cluster B.
  • It's not possible to read two topics from two different clusters with the same instance.

Could I create multiple KafkaStreams instances with different properties that connect to different clusters?

Yes, absolutely. But those different instances will be different applications. (Think "consumer groups".)

Update:

Within a single JVM, you can create as many KafkaStreams instances as you like. You can also configure them to connect to different clusters (and you can use the same KStreamBuilder for all of them if you want to do the same processing).

like image 20
Matthias J. Sax Avatar answered Nov 23 '22 08:11

Matthias J. Sax