Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to join multiple Kafka topics?

So I have...

  • 1st topic that has general application logs (log4j). Stores things like HTTP API requests/responses and warnings, exceptions etc... There can be multiple logs associated to one logical business request. (These logs happen within seconds of each other)
  • 2nd topic contains commands from the above business request which other services take action on. (The commands also happen within seconds of each other, but maybe couple minutes from the original request)
  • 3rd topic contains events generated from actions of those other services. (Most events complete within seconds, but some can take up to 3-5 days to be received)

So a single logical business request can have multiple logs, commands and events associated to it by a uuid which the microservices pass to each other.

So what are some of the technologies/patterns that can be used to read the 3 topics and join them all together as a single json document and then dump them to lets say Elasticsearch?

Streaming?

like image 565
user432024 Avatar asked Mar 11 '18 22:03

user432024


People also ask

Can Kafka consumer group subscribe to multiple topics?

A. Yes, Kafka's design allows consumers from one consumer group to consume messages from multiple topics.

Can a Kafka producer write to multiple topics?

Kafka is able to seamlessly handle multiple producers that are using many topics or the same topic. The consumer subscribes to one or more topics and reads the messages. The consumer keeps track of which messages it has already consumed by keeping track of the offset of messages.

Is Kafka streams multithreaded?

Here is the anatomy of an application that uses the Kafka Streams API. It provides a logical view of a Kafka Streams application that contains multiple stream threads, that each contain multiple stream tasks.


2 Answers

You can use Kafka Streams, or KSQL, to achieve this. Which one depends on your preference/experience with Java, and also the specifics of the joins you want to do.

KSQL is the SQL streaming engine for Apache Kafka, and with SQL alone you can declare stream processing applications against Kafka topics. You can filter, enrich, and aggregate topics. Currently only stream-table joins are supported. You can see an example in this article here

The Kafka Streams API is part of Apache Kafka, and a Java library that you can use to do stream processing of data in Apache Kafka. It is actually what KSQL is built on, and supports greater flexibility of processing, including stream-stream joins.

like image 91
Robin Moffatt Avatar answered Nov 04 '22 23:11

Robin Moffatt


You can use KSQL to join the streams.

  1. There are 2 constructs in KSQL Table/Stream.
  2. Currently, the Join is supported for a Stream & a table. So you need to identify the which is a good fit for what?
  3. You don't need windowing for joins.

Benefits of using KSQL.

  1. KSQL is easy to set up.
  2. KSQL is SQL language which helps you to query your data quickly.

Drawback.

  1. It's not production ready but in April-2018 the release is coming up.
  2. Its little buggy right now but certainly will improve in a few months.

Please have a look.

https://github.com/confluentinc/ksql

like image 3
Zamir Arif Avatar answered Nov 04 '22 23:11

Zamir Arif