Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AWS Kinesis Stream Checkpointing

I have an application that's capable of handling duplicate Kinesis stream records. We're considering the approaches we could take in terms of handling failures. And the following approach was brought up:

If an exception is caught during processRecords, then the application doesn't checkpoint. By doing this, the record will be sent in again along with the next batch, indirectly performing a retry.

So my question is - when it comes to checkpointing for Kinesis streams, is the application expected to always checkpoint on a regular basis? Is manipulating the checkpoint mechanism considered an anti-pattern?

Thanks

like image 685
ddolce Avatar asked Jan 02 '23 17:01

ddolce


1 Answers

I want to first clarify something about checkpointing that may change your perspective. Unless I'm drastically misunderstanding your question, it's less "manipulating" the checkpoint mechanism and more "using it for its intended purpose".

  • Checkpointing is essentially a mechanism allowing you to restart stream processing from the last checkpointed position (instead of at the earliest available record or "now").
  • Skipping checkpointing does NOT automatically mean that records will automatically be retried with the next batch - you would need to handle the exception by restarting your record processor from some stream position before the error (typically 'last checkpoint' in order to do that.

In general, the goal is to use Kinesis to drive useful processing - usually reprocessing duplicate records is not useful (and just costs you money, paid to AWS). Checkpointing often means less time and money wasted reprocessing duplicate records.

You can checkpoint on a time-basis (every X seconds), record-basis (every Y records), every batch, never, or whatever you want - it all depends on how much waste you can tolerate in the event of a failure.

Note: Keep in mind that the checkpointing mechanism is backed with a DynamoDB table, so there are some minor costs (making sure you have adequate table throughput) to doing it too often.

like image 99
Krease Avatar answered Jan 13 '23 10:01

Krease