I am trying to implement an event-driven architecture using Amazon Kinesis as the central event log of the platform. The idea is pretty much the same to the one presented by Nordstrom's with the Hello-Retail project.
I have done similar things with Apache Kafka before, but Kinesis seems to be a cost-effective alternative to Kafka and I decided to give it a shot. I am, however, facing some challenges related to event persistence and replaying. I have two questions:
I'm currently using a lambda function (Firehose is also an option) to persist all events to Amazon S3. Then, one could read past events from the storage and then start listening to new events coming from the stream. But I'm not happy with this solution. Consumers are not be able to use Kinesis' checkpoints (Kafka's consumer offsets). Plus, Java's KCL does not support the AFTER_SEQUENCE_NUMBER yet, which would be useful in such implementation.
Kinesis offers replay out of the box, even with multiple consumers. In Kinesis, consumers do not need to delete messages when they are done processing, so replay is possible.
A Kinesis data stream stores records from 24 hours by default, up to 8760 hours (365 days). You can update the retention period via the Kinesis Data Streams console or by using the IncreaseStreamRetentionPeriod and the DecreaseStreamRetentionPeriod operations.
Kafka is more highly configurable compared to Kinesis. With Kafka, it's possible to write data to a single server. On the other hand, Kinesis is designed to write simultaneously to three servers – a constraint that makes Kafka a better performing solution.
CQRS is implemented by a separation of responsibilities between commands and queries, and event sourcing is implemented by using the sequence of events to track changes in data.
First question. Yes I am using Kinesis Streams when I need to process the received log / event data before storing in S3. When I don't I use Kinesis Firehose.
Second question. Kinesis Streams can store data up to seven days. This is not forever, but should be enough time to process your events. Depending on the value of the events being processed ....
If I do not need to process the event stream before storing in S3, then I use Kinesis Firehose writing to S3. Now I do not have to worry about event failures, persistence, etc. I then process the data stored in S3 with the best tool. I use Amazon Athena often and Amazon Redshift too.
You don't mention how much data you are processing or how it is being processed. If it is large, multiple MB / sec or higher, then I would definitely use Kinesis Firehose. You have to manage performance with Kinesis Streams.
One issue that I have with Kinesis Streams is that I don't like the client libraries, so I prefer to write everything myself. Kinesis Firehose reduces coding for custom applications as you just store the data in S3 and then process afterwards.
I like to think of S3 as my big data lake. I prefer to throw everything into S3 without preprocessing and then use various tools to pull out the data that I need. By doing this I remove lots of points of failure that need to be managed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With