Kafka Streams - Is it possible to run remote interactive queries without a local Kafka Streams instance

Tags:

apache-kafka-streams

I have a few Kafka Streams application instances that I'd like to query remotely.

All of the instances are currently listening on a specified host:port pair, and each instance is able to query its own local state stores and communicate these values via a REST service.

+------------------+  +------------------+  +------------------+
|                  |  |                  |  |                  |
|  instance1:9191  |  |  instance2:9292  |  |  instance3:9393  |
|                  |  |                  |  |                  |
+------------------+  +------------------+  +------------------+

I would like a separate application to be able to query the state stores in these instances:

             consumer group                         separate application
+---------------------------------------+              _____
|   [instance1] [instance2] [instance3] |  <~-------  | app |
+---------------------------------------+              -----

The separate app would utilize the same logic in StreamsMetadataState::getAllMetadataForStore to get the all of the active host/port pairs for the running instances of my application, run the remote queries via the REST service, and aggregate the data inside it's own application logic.

However, I am having difficulty implementing this. Since the host/port pairs seem to be communicated via the consumer group protocol, it looks like I'm required to actually instantiate another Kafka Streams instance (i.e. another member of the consumer group) in order to take advantage of remote interactive queries.

My questions are:

Is it possible to find the host/value pairs for all running instances of an application without also running a local Kafka Streams instance in the same consumer group? (I highlight running because I don't mind instantiating a dummy instance of the Kafka Streams app just to get the host/port meta data, but there is a validateRunning check that prevents me from doing so)
Are there problems with the above design (running a separate app to query all instances of a Kafka Streams app)? i.e. maybe the behavior I'm talking about isn't supported because what I'm doing has ramifications that I haven't considered yet?

It seems like maybe there should be a static method for getting state store meta data that would allow us to pass in the whatever values are being extracted from the builder object directly. i.e.

KafkaStreams::getMetaDataForStore(streamsConfig, storeName);

526

asked Mar 10 '17 22:03

foxygen

1 Answers

Is it possible to find the host/value pairs for all running instances of an application without also running a local Kafka Streams instance in the same consumer group? (I highlight running because I don't mind instantiating a dummy instance of the Kafka Streams app just to get the host/port meta data, but there is a validateRunning check that prevents me from doing so)

Why don't you add a new REST API method to your (first) Kafka Streams application that exposes the currently active host/port pairs to your second app? The app instances would be of course have this information readily available.

The second app -- "the separate app" -- can then query any of the (first) Kafka Streams app instances via this REST method to discover all the running app instances. Here, you wouldn't need to run a dummy KafkaStreams instance in the second app at all.

Are there problems with the above design (running a separate app to query all instances of a Kafka Streams app)? i.e. maybe the behavior I'm talking about isn't supported because what I'm doing has ramifications that I haven't considered yet?

See above. Nothing stops you from adding further methods to the REST layer of the Kafka Streams application. After all, the part of your (first) application that uses the Kafka Streams API doesn't need to be the only part of that app. :-) I think your problem might be that your thinking is kinda "boxed in" in the sense that you feel obliged to having to do everything in your app through the Kafka Streams API -- but this is not so. After all, one of the motivations behind the Kafka Streams API was to let you mix it with other APIs and libraries that you would like to leverage in your application.

151

answered Sep 21 '22 23:09

Michael G. Noll

Related questions
                            
                                Multiple Message Types in a Single Kafka Topic with Avro
                            
                                How to configure docker-compose.yml for Kafka local development?
                            
                                Kafka-streams: setting internal topics cleanup policy to delete doesn't work
                            
                                Updating a Debezium MySQL connector with table whitelist option
                            
                                PySpark 2.x: Programmatically adding Maven JAR Coordinates to Spark
                            
                                Spark structured streaming exactly once - Not achieved - Duplicated events
                            
                                librdkafka consumer and ssl configuration
                            
                                Push Messages from AWS Lambda to Kafka
                            
                                Spark Structured Streaming with Kafka SASL/PLAIN authentication
                            
                                Kafka on kubernetes cluster with Istio
                            
                                Removing one message from a topic in Kafka
                            
                                How to get topics list from Kafka using C#
                            
                                How does Kafka handle a consumer which is running slower than other consumers?
                            
                                How to configure the time it takes for a kafka cluster to re-elect partition leaders after stopping and restarting a broker?
                            
                                How do I ensure that logs are retained forever in Kafka?
                            
                                How can I check how much disk space is being used by Kafka
                            
                                Kafka Authentication Producer Unable to Connect Producer
                            
                                Kafka Docker - Can't produce or consume from outside of docker container
                            
                                Kafka - How to use filter and filternot at the same time?
                            
                                Spark Streaming - Batch Interval vs Processing time

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With