We're currently using an SQL-backed Event Store (the typical 2-table implementation) and some people in the team are afraid that even though we're using the Event Store only for writes, things may get a bit slower, so a suggestion was put in place to instead of adding snapshots here and there, to actually maintain a fully-consistent (with the event streams) snapshot of each aggregate in its most recent state (in JSON format). All the querying on the system will end up being done on the read-side, with a typical SQL database that is updated in an eventual consistency fashion from the ES (write) side.
Having such a system in place would allow us to enjoy the benefits of having an Event Store while simultaneously removing any possible performance issues altogether. We are currently not making use of any "time-travelling" feature, although sooner of later that will end up being the case.
Is this a good approach? There's something in it leaving my uncomfortable. For instance, if we need some sort of time-travelling feature, not having snapshots here and there in each aggregate's event-stream will prove a performance disaster. Of course we could have both a most-current-snapshot per aggregate instance and also snapshots throughout the event-streams.
In case we decide to go down this route, should we make the snapshot update for a given aggregate transactional to the events updates on that same aggregate, or should we just update the events and in an eventually-consistent manner update the snapshot?
What are the downsides of this approach? Has anyone tried something of the kind?
Snapshots are a way to reduce the amount of events you need to fetch and apply when instantiating your aggregate.
The snapshot event is a domain event with a special purpose: it summarises an arbitrary amount of events into a single one. By regularly creating and storing a snapshot event, the event store does not have to return long lists of events.
An aggregate is a cluster of associated objects that we treat as a unit for the purpose of data changes. Each aggregate has a root and a boundary. The boundary defines what is inside the aggregate. The root is a single, specific entity contained in the aggregate.
The core features such as guaranteed writes, concurrency model, granular stream and stream APIs make EventStoreDB the best choice for event-sourced systems - especially when compared with other database solutions originally built for other purposes, And on top of that, it's open source.
You should probably run your own benchmarks before adding unnecessary complexity to your system. We have noticed some performance problems when thousands of events need to be queried and applied to rebuild an aggregate from the event stream, where JSON to object deserialization was the biggest performance bottleneck. If each of your aggregates has only few events (say, < 100) you probably won't notice any significant differences in practice.
Most event stores record snapshots every n
events/commits, say every 50-100 events, and on assembly query the latest snapshots and apply the missing events since the last snapshot. If you also keep all old snapshots in your snapshot database, the time traveling feature will be as fast as a usual query, and you'll only need slightly more persistence space, which is cheap nowadays.
The snapshots should always be written out of the original transaction (and can be generated in another thread), since it's non-crucial if the last snapshot is missing, but you want to don't want your business transaction to fail due to errors in the snapshot write transaction.
Depending on your usual system uptime and data size, it might make sense to held snapshots in memory or a distributed cache/graid or in another database (non-SQL).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With