Kafka Streams API: KStream to KTable

Tags:

apache-kafka-streams

I have a Kafka topic where I send location events (key=user_id, value=user_location). I am able to read and process it as a KStream:

KStreamBuilder builder = new KStreamBuilder();  KStream<String, Location> locations = builder         .stream("location_topic")         .map((k, v) -> {             // some processing here, omitted form clarity             Location location = new Location(lat, lon);             return new KeyValue<>(k, location);         });

That works well, but I'd like to have a KTable with the last known position of each user. How could I do it?

I am able to do it writing to and reading from an intermediate topic:

// write to intermediate topic locations.to(Serdes.String(), new LocationSerde(), "location_topic_aux");  // build KTable from intermediate topic KTable<String, Location> table = builder.table("location_topic_aux", "store");

Is there a simple way to obtain a KTable from a KStream? This is my first app using Kafka Streams, so I'm probably missing something obvious.

812

asked Mar 21 '17 20:03

1 Answers

Update:

In Kafka 2.5, a new method KStream#toTable() will be added, that will provide a convenient way to transform a KStream into a KTable. For details see: https://cwiki.apache.org/confluence/display/KAFKA/KIP-523%3A+Add+KStream%23toTable+to+the+Streams+DSL

Original Answer:

There is not straight forward way at the moment to do this. Your approach is absolutely valid as discussed in Confluent FAQs: http://docs.confluent.io/current/streams/faq.html#how-can-i-convert-a-kstream-to-a-ktable-without-an-aggregation-step

This is the simplest approach with regard to the code. However, it has the disadvantages that (a) you need to manage an additional topic and that (b) it results in additional network traffic because data is written to and re-read from Kafka.

There is one alternative, using a "dummy-reduce":

KStreamBuilder builder = new KStreamBuilder(); KStream<String, Long> stream = ...; // some computation that creates the derived KStream  KTable<String, Long> table = stream.groupByKey().reduce(     new Reducer<Long>() {         @Override         public Long apply(Long aggValue, Long newValue) {             return newValue;         }     },     "dummy-aggregation-store");

This approach is somewhat more complex with regard to the code compared to option 1 but has the advantage that (a) no manual topic management is required and (b) re-reading the data from Kafka is not necessary.

Overall, you need to decide by yourself, which approach you like better:

In option 2, Kafka Streams will create an internal changelog topic to back up the KTable for fault tolerance. Thus, both approaches require some additional storage in Kafka and result in additional network traffic. Overall, it’s a trade-off between slightly more complex code in option 2 versus manual topic management in option 1.

116

answered Sep 26 '22 00:09

Matthias J. Sax

Related questions
                            
                                KafkaStreams shuts down with no exceptions
                            
                                How to connect to multiple clusters in a single Kafka Streams application?
                            
                                Implement Kafka Streams Processor in .Net?
                            
                                Kafka Streams and RPC: is calling REST service in map() operator considered an anti-pattern?
                            
                                What are the differences between KTable vs GlobalKTable and leftJoin() vs outerJoin()?
                            
                                Difference between idempotence and exactly-once in Kafka Stream
                            
                                Dynamically connecting a Kafka input stream to multiple output streams
                            
                                Multiple streams from a single master topic
                            
                                Kafka Streams: use the same `application.id` to consume from multiple topics
                            
                                Print Kafka Stream Input out to console?
                            
                                Kafka - This server is not the leader for that topic-partition
                            
                                Kafka streams use cases for add global store
                            
                                Kafka INVALID_FETCH_SESSION_EPOCH
                            
                                A default binder has been requested, but there are no binders available for 'org.springframework.cloud.stream.messaging.DirectWithAttributesChannel'
                            
                                Does Kafka python API support stream processing?
                            
                                Kafka Streams with Spring Boot
                            
                                Difference between KTable and local store
                            
                                Kafka Streaming Concurrency?
                            
                                Why Apache Kafka Streams uses RocksDB and if how is it possible to change it?
                            
                                UnsatisfiedLinkError: /tmp/snappy-1.1.4-libsnappyjava.so Error loading shared library ld-linux-x86-64.so.2: No such file or directory

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Kafka Streams API: KStream to KTable

Tags:

apache-kafka-streams

Guido

People also ask

1 Answers

Matthias J. Sax

Recent Activity

Donate For Us