In the Kafka Streams Developer Guide it says:
Kafka Streams applications can only communicate with a single Kafka cluster specified by this config value. Future versions of Kafka Streams will support connecting to different Kafka clusters for reading input streams and writing output streams.
Does this mean that my whole application can only connect to a single Kafka Cluster or each instance of KafkaStreams can only connect to a single cluster?
Could I create multiple KafkaStreams instances with different properties that connect to different clusters?
A consumer group, such as a Kafka Streams-based application, can process data from a single Kafka cluster only. Therefore, multi-topic subscriptions or load balancing across the consumers in a consumer group are possible only within a single Kafka cluster.
A Kafka cluster is a cluster which is composed of multiple brokers with their respective partitions. A multiple Kafka cluster means connecting two or more clusters to ease the work of producers and consumers.
Threading ModelKafka Streams allows the user to configure the number of threads that the library can use to parallelize processing within an application instance. Each thread can execute one or more stream tasks with their processor topologies independently.
Just to add to the excellent answer from @Matthias J. Sax.
Does this mean that my whole application can only connect to a single Kafka Cluster or each instance of KafkaStreams can only connect to a single cluster?
I think there are two questions here.
It depends on the definition of "my whole application", i.e. it could simply be a single KafkaStreams
instance or multiple instances on a single JVM or perhaps multiple KafkaStreams
instances on a single JVM in a Docker container that is executed as a pod. Whatever it is, you can find "my whole application" a bit too broad and not very precise.
The point is that there is no way you can create a KafkaStreams
instance that could talk to multiple Kafka clusters (since the configuration is through properties that are key-value pairs in a map) and so just by this you could answer your own question, couldn't you?
Being unable to use two or more Kafka clusters in a Kafka Streams application is one of the differences between Kafka Streams and Spark Structured Streaming (with the latter being able to use as many Kafka clusters as you want and so you could build pipelines between different Kafka clusters).
It means that a single application can only connect to one cluster.
Could I create multiple KafkaStreams instances with different properties that connect to different clusters?
Yes, absolutely. But those different instances will be different applications. (Think "consumer groups".)
Update:
Within a single JVM, you can create as many KafkaStreams
instances as you like. You can also configure them to connect to different clusters (and you can use the same KStreamBuilder
for all of them if you want to do the same processing).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With