Flink+Kafka reset checkpoint and offset

Question

In short, I'd like to re-run a Flink pipeline on data in Kafka from the beginning.

Flink 0.10.2, Kafka 0.8.2.

I have a tweets topic in Kafka with retention 2 hours, and a pipeline in Flink that counts tweets with a sliding window of 5 minutes every 10s.

If I interrupt the pipeline and I re-run it, I'd like it to re-read older tweets, thus emitting the count of 5 min worth of tweets. Instead, it seems to restart from newly arrived tweets, so it takes 5 minutes before the count is "at regime".

I've tried both auto.offset.reset = smallest/earliest, and changing the group.id, but unsuccessfully. I also tried to manually change offsets in Kafka as described here: https://metabroadcast.com/blog/resetting-kafka-offsets

I then assume that the issue can be related to Flink's checkpointing, but I have no clue/can't find information on how to reset that.

Can anyone share some working code? Thanks, E.

Robert Metzger · Accepted Answer

To re-read everything available in a Kafka topic, setting a new "group.id" and the "auto.offset.reset" to "earliest" should be sufficient.

If that doesn't work, there's something wrong.

Flink+Kafka reset checkpoint and offset

Tags:

apache-kafka

kafka-consumer-api

apache-flink

flink-streaming

ecesena

1 Answers

Robert Metzger

Recent Activity

Donate For Us

Flink+Kafka reset checkpoint and offset

Tags:

apache-kafka

kafka-consumer-api

apache-flink

flink-streaming

ecesena

1 Answers

Robert Metzger

Related questions

Recent Activity

Donate For Us