Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

DynamoDB - Event Store on AWS

I'm designing an Event Store on AWS and I chose DynamoDB because it seemed the best option. My design seems to be quite good, but I'm facing some issues that I can't solve.

**The design

Events are uniquely identified by the pair (StreamId, EventId):

  • StreamId: it's the same of the aggregateId, which means one Event Stream for one Aggregate.
  • EventId: an incremental number that helps keeping the ordering inside the same Event Stream

Events are persisted on DynamoDb. Each event maps to a single record in a table where the mandatory fields are StreamId, EventId, EventName, Payload (more fields can be added easily).

The partitionKey is the StreamId, the sortKey is the EventId.

Optimistic Locking is used while writing an event to an Event Stream. To achieve this, I'm using the DynamoDb conditional writes. If an event with the same (StreamId, EventId) already exists, I need to recompute the aggregate, recheck business conditions and finally write again if business conditions pass.

Event Streams

Each Event Stream is identified by the partitionKey. Query a stream for all events equals to query for partitionKey=${streamId} and sortKey between 0 and MAX_INT.

Each Event Stream identifies one and only one aggregate. This helps to handle concurrent writes on the same aggregate using optimistic locking as explained before. This also grants great performance while recomputing an aggregate.

Publication of events

Events are published exploiting the combination of DynamoDB Streams + Lambda.

Replay events

Here's where the issues start. Having each event stream mapped with only one aggregate (which leads to having a great number of event streams), there's no easy way to know which event streams from which I need to query for all events.

I was thinking of using an additional record, somewhere in DynamoDB that stores in an array all StreamIds. I can then query for it and start querying for the events, but if a new stream is created while I'm replaying, I'll lose it.

Am I missing something? Or, is my design simply wrong?

like image 433
Christian Paesante Avatar asked Apr 19 '19 13:04

Christian Paesante


2 Answers

You can use a GSI to retrieve the events in a given time period. Depending on the number of events being processed you might need to write shard the GSI to avoid hot keys. Assuming the event items are less than 1KB you will need to spread them out on the GSI if the ingestion rate is higher than 1000 items/sec. If the events are larger than 1KB you will need to spread them out more. For items less than 1KB take the total number of events per second and divide by 1000. This will tell you how many shards the GSI needs to keep up with the table, e.g. assuming you are ingesting 5K events per second you will need 5 shards.

When you write the events to the table add a new attribute called "GSIKey" and create a random value between 0-4 for that attribute when inserting events. Create the GSI using "GSIKey" as the partition key and timestamp as the Sort Key. When you need to get all events in a given time range query all 5 shards with the time range you are looking for and then simply merge sort the result sets to produce a time ordered list of events. If you are processing less than 1000 events per second then you can use "0" as the GSIKey value and just query that one partition for the events you need.

like image 117
Rick Houlihan Avatar answered Oct 17 '22 04:10

Rick Houlihan


Am I missing something?

Not really; it's a Hard Problem[tm].

Your write use cases are typically only concerned with a single reference within the model -- the pointer to the current history of events. Your read use cases are often concerned with data distributed across multiple streams.

The way that this usually works is that your persistence store not only maintains the changes that have been written, but also an index that supports reads. For example, Eventide's postgres message store depends on the indexing that happens when you insert rows into a table. In the case of Event Store, the updates to the index are written as part of the same serialized "transaction" as the changes to the stream(s).

Another way of expressing the same idea: the queries are actually running at a coarser grain than the writes, with the storage appliance implicitly providing the coordination guarantees that you expect.

Take away the coordination, and you have something analogous to assigning a unique host to each stream.

It may be useful to look carefully at the Git object database and familiarize yourself with what's really happening in that store under the covers. I also found that Rich Hickey's talk The Language of the System provided useful concepts in distinguishing values from names from references.

I chose DynamoDB because it seemed the best option

Unless you have some compelling business reason to build your event store from the ground up, I'd encourage you to instead look at Aurora, and see how far you can get with that. It might buy you the time you need to wait for somebody else to put together a cost effective cloud native event store appliance for you.

like image 44
VoiceOfUnreason Avatar answered Oct 17 '22 05:10

VoiceOfUnreason