In short, I'd like to re-run a Flink pipeline on data in Kafka from the beginning.
Flink 0.10.2, Kafka 0.8.2.
I have a tweets topic in Kafka with retention 2 hours, and a pipeline in Flink that counts tweets with a sliding window of 5 minutes every 10s.
If I interrupt the pipeline and I re-run it, I'd like it to re-read older tweets, thus emitting the count of 5 min worth of tweets. Instead, it seems to restart from newly arrived tweets, so it takes 5 minutes before the count is "at regime".
I've tried both auto.offset.reset = smallest/earliest
, and changing the group.id
, but unsuccessfully. I also tried to manually change offsets in Kafka as described here: https://metabroadcast.com/blog/resetting-kafka-offsets
I then assume that the issue can be related to Flink's checkpointing, but I have no clue/can't find information on how to reset that.
Can anyone share some working code? Thanks, E.
To re-read everything available in a Kafka topic, setting a new "group.id" and the "auto.offset.reset" to "earliest" should be sufficient.
If that doesn't work, there's something wrong.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With