Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Kafka Streams - Is it possible to run remote interactive queries without a local Kafka Streams instance

I have a few Kafka Streams application instances that I'd like to query remotely.

All of the instances are currently listening on a specified host:port pair, and each instance is able to query its own local state stores and communicate these values via a REST service.

+------------------+  +------------------+  +------------------+
|                  |  |                  |  |                  |
|  instance1:9191  |  |  instance2:9292  |  |  instance3:9393  |
|                  |  |                  |  |                  |
+------------------+  +------------------+  +------------------+

I would like a separate application to be able to query the state stores in these instances:

             consumer group                         separate application
+---------------------------------------+              _____
|   [instance1] [instance2] [instance3] |  <~-------  | app |
+---------------------------------------+              -----

The separate app would utilize the same logic in StreamsMetadataState::getAllMetadataForStore to get the all of the active host/port pairs for the running instances of my application, run the remote queries via the REST service, and aggregate the data inside it's own application logic.

However, I am having difficulty implementing this. Since the host/port pairs seem to be communicated via the consumer group protocol, it looks like I'm required to actually instantiate another Kafka Streams instance (i.e. another member of the consumer group) in order to take advantage of remote interactive queries.

My questions are:

  • Is it possible to find the host/value pairs for all running instances of an application without also running a local Kafka Streams instance in the same consumer group? (I highlight running because I don't mind instantiating a dummy instance of the Kafka Streams app just to get the host/port meta data, but there is a validateRunning check that prevents me from doing so)
  • Are there problems with the above design (running a separate app to query all instances of a Kafka Streams app)? i.e. maybe the behavior I'm talking about isn't supported because what I'm doing has ramifications that I haven't considered yet?

It seems like maybe there should be a static method for getting state store meta data that would allow us to pass in the whatever values are being extracted from the builder object directly. i.e.

KafkaStreams::getMetaDataForStore(streamsConfig, storeName);
like image 526
foxygen Avatar asked Mar 10 '17 22:03

foxygen


People also ask

When should you not use Kafka Streams?

As point 1 if having just a producer producing message we don't need Kafka Stream. If consumer messages from one Kafka cluster but publish to different Kafka cluster topics. In that case, you can even use Kafka Stream but have to use a separate Producer to publish messages to different clusters.

Can we query data from Kafka?

Yes, you can do it with interactive queries. You can create a kafka stream to read the input topic and generate a state store ( in memory/rocksdb and synchronize with kafka ). This state store is queryable by key ( ReadOnlyKeyValueStore ).

How do I query a Kafka message?

The only fast way to search for a record in Kafka (to oversimplify) is by partition and offset. The new producer class can return, via futures, the partition and offset into which a message was written. You can use these two values to very quickly retrieve the message.


1 Answers

  • Is it possible to find the host/value pairs for all running instances of an application without also running a local Kafka Streams instance in the same consumer group? (I highlight running because I don't mind instantiating a dummy instance of the Kafka Streams app just to get the host/port meta data, but there is a validateRunning check that prevents me from doing so)

Why don't you add a new REST API method to your (first) Kafka Streams application that exposes the currently active host/port pairs to your second app? The app instances would be of course have this information readily available.

The second app -- "the separate app" -- can then query any of the (first) Kafka Streams app instances via this REST method to discover all the running app instances. Here, you wouldn't need to run a dummy KafkaStreams instance in the second app at all.

  • Are there problems with the above design (running a separate app to query all instances of a Kafka Streams app)? i.e. maybe the behavior I'm talking about isn't supported because what I'm doing has ramifications that I haven't considered yet?

See above. Nothing stops you from adding further methods to the REST layer of the Kafka Streams application. After all, the part of your (first) application that uses the Kafka Streams API doesn't need to be the only part of that app. :-) I think your problem might be that your thinking is kinda "boxed in" in the sense that you feel obliged to having to do everything in your app through the Kafka Streams API -- but this is not so. After all, one of the motivations behind the Kafka Streams API was to let you mix it with other APIs and libraries that you would like to leverage in your application.

like image 151
Michael G. Noll Avatar answered Sep 21 '22 23:09

Michael G. Noll