Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to handle reprocessing scenarios in AWS Kinesis?

I am exploring AWS Kinesis for a data processing requirement that replaces old batch ETL processing with a stream based approach.

One of the key requirements for this project is the ability to reprocess data in cases when

  • A bug is discovered and fixed and the application is redeployed. Data needs to be reprocessed from the beginning.
  • New features are added and the history needs to be reprocessed either fully or partially.

The scenarios are very nicely documented here for Kafka - https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Data+%28Re%29Processing+Scenarios.

I have seen the timestamp based ShardIterator in Kinesis and I think a Kafka like resetter-tool can be built using Kinesis APIs but it would be great if something like this already exists. Even if it doesn't, it would be good to learn from those who have solved similar problems.

So, does anyone know of any existing resources, patterns and tools available to do this in Kinesis?

like image 856
Rahul Avatar asked Feb 16 '18 09:02

Rahul


People also ask

How do I increase throughput in Kinesis?

To add more throughput to a Kinesis Data Stream, add one or more shards. This is also referred to as Shard Splitting. It will increase the stream's capacity by 1 megabyte per second per shard. Shard Splitting can be used to divide a hot shard.

Can you purge a Kinesis stream?

You can delete a stream with the Kinesis Data Streams console, or programmatically.

Can Kinesis replay messages?

Q: How does Amazon Kinesis Data Streams differ from Amazon SQS? Amazon Kinesis Data Streams enables real-time processing of streaming big data. It provides ordering of records, as well as the ability to read and/or replay records in the same order to multiple Amazon Kinesis Applications.

How can you scale an Amazon Kinesis data stream that is reaching capacity?

Currently, you scale an Amazon Kinesis Data Stream shard programmatically. Alternatively, you can use the Amazon Kinesis Scaling Utilities. To do so, you can use each utility manually, or automated with an AWS Elastic Beanstalk environment.


1 Answers

I have run into scenarios where i want to reprocess the kinesis processed records, I have used Kinesis-VCR for re-processing the kinesis generated records.

Kinesis-VCR records the kinesis streams and maintains a metadata of the files processed by kinesis at a given time.

Later, we can use to re-process/replay the events for any given time range.

Here is the github link for the same.

https://github.com/scopely/kinesis-vcr

Let me know if this works for you.

Thanks & Regards, Srivignesh KN

like image 95
Srivignesh KN Avatar answered Sep 27 '22 01:09

Srivignesh KN