I'm building KTable on an input topic and I'm joining with KStream on two Kafka Stream application instances.
The input topic for KTable is already a log compacted topic. So when one of my application instance goes down, another instance state store seems to be refreshed with whole state by reading from the input log compacted topic.
So there is no need to enable logging (change log) for my KTable store?
My source input log compacted topic could have millions of records, so if i enable logging on that KTable state store will it improve my state store refresh time in case of failure or it will not have an effect as the source topic is already log compacted? Thanks!
So there is no need to enable logging (change log) for my KTable store?
That's correct. Kafka Streams will not create an additional changelog topic, but will use the input topic for recovery (no need to duplicate data).
so if i enable logging on that KTable state store
How would you do that?
will it improve my state store refresh time in case of failure or it will not have an effect as the source topic is already log compacted?
In general, you would not gain anything. As you stated correctly, the input topic is compacted anyway, so both topics would contain roughly the same data.
If you want to decrease fail over time, you should configure StandbyTasks
via StreamsConfig
parameter num.standby.replicas
(default is 0, so you could set it to 1). Cf https://docs.confluent.io/current/streams/developer-guide.html#state-restoration-during-workload-rebalance
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With