Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When to use GlobalKTable over KTable when 1 partition is used

I understand the differences between the 2, but still, it seems that i use KTable as a "default", not really know when to prefer a GlobalKTable.

Please share your experience, when does a GlobalKTable is a must, why not to use it etc.

like image 867
Aaron_ab Avatar asked Jan 28 '23 00:01

Aaron_ab


1 Answers

The key is that KTable is partitioned, meaning that if you have an underlying topic with N partitions, the instance that takes care of a subset of those partitions will have access to the data on those partitions, but not to the data on the partitions that this instance is not managing.

However, GlobalKTable will use all of the topic data in all of the instances. For example, you'd want to use it for a join with a set of external data whose partitioning is not directly linked with the incoming data (or cannot be predicted its relation).

E.g. Say you have a stream from a users topic, with default round-robin partitioning, that has a country field, and you need to enrich that users stream with data from the user's country. Then, you may use a GlobalKTable with data for the countries, and join e.g. a users stream with a that country GlobalKTable on the country.

Since GlobalKTable gives you access to all of the potential joinable data, it is much more efficient than a KTable for smaller data, because you don't need to repartition the data for that join(all of the data is right there). But you should be aware of the size: you have to handle all of the data set in each of the partitions. This is why it is normally used in limited-size data collections, and not super-big either.

If you perform a join between a KStream and a KTable, it would need to repartition data (creating an internal topic), to re-group data accordingly to the joining key.

Similarly, if you are using the Processor API, if you query a KTable from an instance, you'd have there the data that was generated by that instance, and not the other instances.

UPDATE: Also see @matthias-j-sax comment on synchronization.

like image 93
xmar Avatar answered Mar 16 '23 07:03

xmar