Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does co-partitioning of two Kstreams in kafka require same number of partitions for both the streams?

I wanted to know why does co-partitioning of two Kstreams in kafka require same number of partitions for both the streams as is given in the documentation in below URL: enter link description here

like image 892
Akash Jain Avatar asked Aug 07 '17 10:08

Akash Jain


1 Answers

As the name "co-partition" indicates, you want to put data from different topics but same key to the same Kafka Streams application instance. If you don't have the same number of partitions, it's not possible to get this behavior.

Assume you have topic A with 2 partitions and topic B with 3 partitions. Thus, it can happen that one record with key X is hashed to partitions A-0 and B-1 (ie, not same partition number). However, for a different key Y it might be hashed to A-0 but B-2.

Only if the number of partitions is the same for both topics, records with same key end up in the same partitions (of different topics of course), and this allows to process A-0/B-0 and A-1/B-1 etc together.

like image 181
Matthias J. Sax Avatar answered Nov 09 '22 18:11

Matthias J. Sax