When you don't use a windowed setting with kafka streams, how does aggregation work? It seems to have some kind of window when using the materialized key value store. I cant seem to find the threshold for when it will create a new aggregation. This will help me decide if I need to add a custom window for my use case.
From this Confluent link :
Windowing allows you to bucket stateful operations by time, without which your aggregations would endlessly accumulate. A window gives you a snapshot of an aggregate within a given timeframe, and can be set as hopping, tumbling, session, or sliding.
So when you use a windowed setting with kafka streams, two records with the same aggregation key can "belong" to different aggregates if they come in different timeframes (the notion of timeframe here depends on the type of windowing you use, i.e. hopping, tumbling, session, or sliding).
But if you don't use a windowed setting with kafka streams, by default all of the records with the same aggregation key will be acumulated in the same aggregate as there is no notion of timeframe.
Details :
When you don't use windowing during an aggregation, you will by default have a changelog topic with the configuration log.cleanup.policy=compact.
So the changelog topic will always have at least the last aggregation result (which will be used for subsequent aggregations) as per compaction rules.
And so will the (Persistent or In-Memory) KeyValueStore backed by the changelog topic as this state store does not have a retention time unlike WindowStore or SessionStore.
And if the KeyValueStore is lost due to application failure, it will be reconstructed from the changelog topic, so it will still have the last aggregation result.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With