Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Flink+Kafka reset checkpoint and offset

In short, I'd like to re-run a Flink pipeline on data in Kafka from the beginning.

Flink 0.10.2, Kafka 0.8.2.

I have a tweets topic in Kafka with retention 2 hours, and a pipeline in Flink that counts tweets with a sliding window of 5 minutes every 10s.

If I interrupt the pipeline and I re-run it, I'd like it to re-read older tweets, thus emitting the count of 5 min worth of tweets. Instead, it seems to restart from newly arrived tweets, so it takes 5 minutes before the count is "at regime".

I've tried both auto.offset.reset = smallest/earliest, and changing the group.id, but unsuccessfully. I also tried to manually change offsets in Kafka as described here: https://metabroadcast.com/blog/resetting-kafka-offsets

I then assume that the issue can be related to Flink's checkpointing, but I have no clue/can't find information on how to reset that.

Can anyone share some working code? Thanks, E.

like image 273
ecesena Avatar asked Sep 18 '25 08:09

ecesena


1 Answers

To re-read everything available in a Kafka topic, setting a new "group.id" and the "auto.offset.reset" to "earliest" should be sufficient.

If that doesn't work, there's something wrong.

like image 128
Robert Metzger Avatar answered Sep 21 '25 03:09

Robert Metzger