Kafka Streams - what is stored in memory and disk in Streams App

Question

I am new to Kafka Streams and I've been reading documentation on how to setup Kafka Streams application.

I am not clear though, how the data is handled - what is stored in memory and what is stored on disk. I have seen RocksDB mentioned somewhere, but not in the streams documentation.

The problem I am trying to solve is as follows. I have 2 Kafka topics, both key-value store type that keep the oldest value for each key. In my streams application I want to join both topics and output the join back to kafka that can be later consumed by some sink. What I am worried about is that it is not clear how joins are performed. Both topics will have GBs of data, so there is no chance that is going to fit in Streams App memory.

Matthias J. Sax · Accepted Answer

You can read each topic as a KTable and do a table-table join:

KTable table1 = builder.table("topic-1");
KTable table2 = builder.table("topic-2");

KTable joinResult = table1.join(table2, ...);
joinResult.to("output-topic");

For more details see: http://docs.confluent.io/current/streams/developer-guide.html#ktable-ktable-join Also check out the examples: https://github.com/confluentinc/examples/tree/3.3.0-post/kafka-streams

For runtime, both topics will be materialized in a RocksDB state store. RocksDB is able to spill to disk. Also note, that a single RocksDB instance only needs to hold the data of a single input partition. Compare http://docs.confluent.io/current/streams/architecture.html#parallelism-model

Kafka Streams - what is stored in memory and disk in Streams App

Tags:

apache-kafka

apache-kafka-streams

eddyP23

1 Answers

Matthias J. Sax

Recent Activity

Donate For Us

Kafka Streams - what is stored in memory and disk in Streams App

Tags:

apache-kafka

apache-kafka-streams

eddyP23

1 Answers

Matthias J. Sax

Related questions

Recent Activity

Donate For Us