Why does kafka streams threads die when the source topic partitions changes ? Can anyone point to reading material around this?

Tags:

We increased the number of partitions to parallel process the messages as the throughput of the message was high. As soon as we increased the number of partitions all the streams thread which were subscribed to that topic died. We changed the consumer group id then we restarted the application it worked fine.

I know that the number of partitions changelog topic of application should be same as source topic. I would like to know the reason behind this.

I saw this link - https://issues.apache.org/jira/browse/KAFKA-6063?jql=project%20%3D%20KAFKA%20AND%20component%20%3D%20streams%20AND%20text%20~%20%22partition%22

Couldn't find the reason

https://github.com/apache/kafka/blob/fdc742b1ade420682911b3e336ae04827639cc04/streams/src/main/java/org/apache/kafka/streams/processor/internals/InternalTopicManager.java#L122

Basically, reason behind this if condition.

287

asked Feb 12 '19 12:02

kartik7153

1 Answers

Input topic partitions define the level of parallelism, and if you have stateful operations like aggregation or join, the state of those operations in sharded. If you have X input topic partitions you get X tasks each with one state shard. Furthermore, state is backed by a changelog topic in Kafka with X partitions and each shard is using exactly one of those partitions.

If you change the number of input topic partitions to X+1, Kafka Streams tries to create X+1 tasks with X store shards, however the exiting changelog topic has only X partitions. Thus, the whole partitioning of your application breaks and Kafka Streams cannot guaranteed correct processing and thus shuts down with an error.

Also note, that Kafka Streams assume, that input data is partitioned by key. If you change the number of input topic partitions, the hash-based partitioning changes what may result in incorrect output, too.

In general, it's recommended to over-partition topics in the beginning to avoid this issue. If you really need to scale out, it is best to create a new topic with the new number of partitions, and start a copy of the application (with new application ID) in parallel. Afterwards, you update your upstream producer applications to write into the new topic, and finally shutdown the old application.

answered Sep 30 '22 17:09

Matthias J. Sax

Related questions
                            
                                Google OR-Tools: Could not run the java example, java.lang.UnsatisfiedLinkError: no jniortools in java.library.path
                            
                                How to find out whether a ConstraintViolation is from a JSON property or from a URL parameter?
                            
                                "both methods have same erasure" error using bounded type parameters
                            
                                RxJava1 StackOverflow Exception With Too Many Observables
                            
                                How to use Prometheus' JMX exporter java agent to collect custom metrics
                            
                                IntelliJ show "always true" hint but not "always false" for instanceof
                            
                                Copy one object into another changing data types of some fields
                            
                                Use Kafka Streams for windowing data and processing each window at once
                            
                                File Last Modified Not Updating when Java Writes to Windows Server 2016
                            
                                How to customise order of Intentions menu
                            
                                Application of Union+Find algorithm(Disjoint Set)
                            
                                Garbage Collector not freeing "trash memory" as it should in an Android application
                            
                                How can I click on Text link inside in textview in appium
                            
                                Inheritance with lombok annotation get errors
                            
                                Handling nested Collections with Java 8 streams
                            
                                Why is this my attempt of spawning endless Threads stopping at 4?
                            
                                Does Spark support BigInteger type?
                            
                                Android - Service, IntentService, JobIntentService - they are stoped if app killed
                            
                                Get an S3Object from a GetObjectResponse in AWS Java SDK 2.0
                            
                                Why doesn't the JVM crash when entering infinite recursion?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why does kafka streams threads die when the source topic partitions changes ? Can anyone point to reading material around this?

Tags:

java

apache-kafka

apache-kafka-streams

kartik7153

People also ask

1 Answers

Matthias J. Sax

Recent Activity

Donate For Us