What, exactly happens when a repartition occurs in a kafka stream?

Tags:

Say I have a stream of employees, keyed by empId, which also includes departmentId. I want to aggregate by department. So I do a selectKey(mapper to get departmentId), then groupByKey() (or I could just do a a groupBy(...), I assume), and then, say, count(). What exactly happens? I gather that it does a "repartition". I think what happens is that it writes to an "internal" topic, which I is just a regular topic with a derived name, created automatically. That is, shared by all instances of the stream, not just one (i.e. not local). So the aggregation is across all of the new key, not just those messages from the source stream instance (I think). Is that correct?

I've not found a comprehensive description of repartitioning. Can anybody point me to a good article on this?

206

asked Mar 07 '19 20:03

mconner

1 Answers

What you describe is exactly what is happening.

A repartition step is the same as a through() (auto-inserted into the processing topology) what is a shortcut of to("topic") plus builder.stream("topic").

It's also illustrated and explained in this blog post: https://www.confluent.io/blog/data-reprocessing-with-kafka-streams-resetting-a-streams-application/

answered Sep 23 '22 11:09

Matthias J. Sax

Related questions
                            
                                Kafka: Sarama, idempotence and transactional.id
                            
                                How can Kafka limitations be avoided? [closed]
                            
                                Error reading field 'topic_metadata' in Kafka
                            
                                Spark DStream periodically call saveAsObjectFile using transform does not work as expected
                            
                                Kafka - stop retrying on ConnectException
                            
                                kafka-node LeaderNotAvailable errors on send
                            
                                Guaranteed delivery of multiple messages to Kafka cluster
                            
                                kafka log-compaction consuming data
                            
                                Is there any way to use confluent schema registry with kafka-node module?
                            
                                How to check if Kafka Consumer is ready
                            
                                kafka ack=all and min-isr
                            
                                The group member's supported protocols are incompatible with those of existing members
                            
                                Spring KafkaListener: How to know when it's ready
                            
                                Synchronize Data From Multiple Data Sources
                            
                                AWS MSK - Timeout when creating Kafka topic with ACL turned-on
                            
                                Message routing in kafka
                            
                                Kafka 0.10 Java Client TimeoutException: Batch containing 1 record(s) expired
                            
                                UDF cause warning: CachedKafkaConsumer is not running in UninterruptibleThread (KAFKA-1894)
                            
                                Use Kafka Streams for windowing data and processing each window at once
                            
                                Why does kafka streams threads die when the source topic partitions changes ? Can anyone point to reading material around this?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What, exactly happens when a repartition occurs in a kafka stream?

Tags:

apache-kafka

apache-kafka-streams

mconner

People also ask

1 Answers

Matthias J. Sax

Recent Activity

Donate For Us