Event Sourcing with Kinesis - Replaying and Persistence

Tags:

I am trying to implement an event-driven architecture using Amazon Kinesis as the central event log of the platform. The idea is pretty much the same to the one presented by Nordstrom's with the Hello-Retail project.

I have done similar things with Apache Kafka before, but Kinesis seems to be a cost-effective alternative to Kafka and I decided to give it a shot. I am, however, facing some challenges related to event persistence and replaying. I have two questions:

Are you guys using Kinesis for such use-case OR do you recommend using it?
Since Kinesis is not able to retain the events forever (like Kafka does), how to handle replays from consumers?

I'm currently using a lambda function (Firehose is also an option) to persist all events to Amazon S3. Then, one could read past events from the storage and then start listening to new events coming from the stream. But I'm not happy with this solution. Consumers are not be able to use Kinesis' checkpoints (Kafka's consumer offsets). Plus, Java's KCL does not support the AFTER_SEQUENCE_NUMBER yet, which would be useful in such implementation.

806

asked Dec 12 '17 11:12

Thiago Zanivan Felisberto

1 Answers

First question. Yes I am using Kinesis Streams when I need to process the received log / event data before storing in S3. When I don't I use Kinesis Firehose.

Second question. Kinesis Streams can store data up to seven days. This is not forever, but should be enough time to process your events. Depending on the value of the events being processed ....

If I do not need to process the event stream before storing in S3, then I use Kinesis Firehose writing to S3. Now I do not have to worry about event failures, persistence, etc. I then process the data stored in S3 with the best tool. I use Amazon Athena often and Amazon Redshift too.

You don't mention how much data you are processing or how it is being processed. If it is large, multiple MB / sec or higher, then I would definitely use Kinesis Firehose. You have to manage performance with Kinesis Streams.

One issue that I have with Kinesis Streams is that I don't like the client libraries, so I prefer to write everything myself. Kinesis Firehose reduces coding for custom applications as you just store the data in S3 and then process afterwards.

I like to think of S3 as my big data lake. I prefer to throw everything into S3 without preprocessing and then use various tools to pull out the data that I need. By doing this I remove lots of points of failure that need to be managed.

answered Sep 24 '22 12:09

John Hanley

Related questions
                            
                                How to encrypt AWS Lambda environment variables using CloudFormation
                            
                                Refresh AWS Quicksight automatically [closed]
                            
                                How to debug "Missing Authentication Token" in AWS API Gateway?
                            
                                Auto Delete SQS queue
                            
                                AWS Java SDK: AbortedException on call to AmazonSQSClient.receiveMessage
                            
                                Netflix Zuul/Ribbon/Eureka vs AWS ELB/ALB & ECS
                            
                                How to fix intermittent 503 Service Unavailable after idling/redeployments on AWS HTTP API Gateway & Fargate/ECS?
                            
                                RDS Database storage runs out of space
                            
                                How to create a Vagrantfile that matches Elastic Beanstalk?
                            
                                Is ssl termination at AWS load balancer ELB secure?
                            
                                How to response non-latin characters in AWS lambda?
                            
                                Connection Pooling with PostgreSQL and AWS
                            
                                Is boto3.Bucket.upload_file blocking or non-blocking?
                            
                                How to access http headers in custom authorizer AWS lambda function
                            
                                AWS API Gateway: Issues with importing Swagger API schema
                            
                                Is it possible to add multiple auto-scaling policy with Elastic Beanstlak
                            
                                Can Spark Replace ETL Tool
                            
                                CloudFormation AutoScalingGroup not waiting for signal on update/scale-up
                            
                                AWS 'Bucket already exists' - how to "migrate" existing resources to CloudFormation?
                            
                                Connect to S3 accelerate endpoint with boto3

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Event Sourcing with Kinesis - Replaying and Persistence

Tags:

amazon-web-services

cqrs

event-sourcing

amazon-kinesis

serverless-architecture

Thiago Zanivan Felisberto

People also ask

1 Answers

John Hanley

Recent Activity

Donate For Us