Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Snapshot taking and restore strategies

Tags:

I've been reading about CQRS+EventSoucing patterns (which I wish to apply in a near future) and one point common to all decks and presentations I found is to take snapshots of your model state in order to restore it, but none of these share patterns/strategies of doing that.

I wonder if you could share your thoughts and experience in this matter particularly in terms of:

  • When to snapshot
  • How to model a snapshot store
  • Application/cache cold start

TL;DR: How have you implemented Snapshotting in your CQRS+EventSourcing application? Pros and Cons?

like image 911
Mikhas Avatar asked Jun 24 '16 20:06

Mikhas


People also ask

What are the three strategies of backups?

There are three different types of backup processes that can be used to protect your files: full, incremental, and differential.

What is a snapshot technique?

The SNaPshot technique, also known as minisequencing, is a primer extension-based method developed for the analysis of Single Nucleotide Polymorphisms (SNPs).

Can you restore from a snapshot?

You can use snapshots to backup and restore disk data in the following ways: After you take a snapshot of a boot or non-boot disk, create a new disk based on the snapshot. After you take a snapshot of a boot disk, create a new VM based on the boot disk snapshot or restore the boot disk from a snapshot.


2 Answers

There are few instances you need to snapshot for sure. But there are a couple - a common example is an account in a ledger. You'll have thousands maybe millions of credit/debit events producing the final BALANCE state of the account - it would be insane not to snapshot that every so often.

My approach to snapshoting when I designed Aggregates.NET was its off by default and to enable your aggregates or entities must inherit from AggregateWithMemento or EntityWithMemento which in turn your entity must define a RestoreSnapshot, a TakeSnapshot and a ShouldTakeSnapshot

The decision whether to take a snapshot or not is left up to the entity itself. A common pattern is

Boolean ShouldTakeSnapshot() {     return this.Version % 50 == 0; } 

Which of course would take a snapshot every 50 events.

When reading the entity stream the first thing we do is check for a snapshot then read the rest of the entity's stream from the moment the snapshot was taken. IE: Don't ask for the entire stream just the part we have not snapshoted.

As for the store - you can use literally anything. VOU is right though a key-value store is best because you only need to 1. check if one exists 2. load the entire thing - which is ideal for kv

For system restarts - I'm not really following what your described problem is. There's no reason for your domain server to be stateful in the sense that its doing something different at different points in time. It should do just 1 thing - process the next command. In the process of handling a command it loads data from the event store, including a snapshot, runs the command against the entity which either produces a business exception or domain events which are recorded to the store.

I think you may be trying to optimize too much with this talk of caching and cold starts.

like image 21
Charles Avatar answered Sep 25 '22 02:09

Charles


  • Rule #1: Don't.
  • Rule #2: Don't.

Snapshotting an event sourced model is a performance optimization. The first rule of performance optimization? Don't.

Specifically, snapshotting reduces the amount of time you lose in your repository trying to reload the history of your model from your event store.

If your repository can keep the model in memory, then you aren't going to be reloading it very often. So the win from snapshotting will be small. Therefore: don't.

If you can decompose your model into aggregates, which is to say that you can decompose the history of your model into a number of entities that have non-overlapping histories, then your one model long model history becomes many many short histories that each describe the changes to a single entity. Each entity history that you need to load will be pretty short, so the win from a snapshot will be small. Therefore: don't.

The kind of systems I'm working today require high performance but not 24x7 availability. So in a situation where I shut down my system for maintenace and restart it I'd have to load and reprocess all my event store as my fresh system doesn't know which aggregate ids to process the events. I need a better starting point for my systems to restart be more efficient.

You are worried about missing a write SLA when the repository memory caches are cold, and you have long model histories with lots of events to reload. Bolting on snapshotting might be a lot more reasonable than trying to refactor your model history into smaller streams. OK....

The snapshot store is a read model -- at any point in time, you should be able to blow away the model and rebuild it from the persisted history in the event store.

From the perspective of the repository, the snapshot store is a cache; if no snapshot is available, or if the store itself doesn't respond within the SLA, you want to fall back to reprocessing the entire event history, starting from the initial seed state.

The service provider interface is going to look something like

interface SnapshotClient {     SnapshotRecord getSnapshot(Identifier id) } 

SnapshotRecord is going to provide to the repository the information it needs to consume the snapshot. That's going to include at a minimum

  1. a memento that allows the repository to rehydrate the snapshotted state
  2. a description of the last event processed by the snapshot projector when building the snapshot.

The model will then re-hydrate the snapshotted state from the memento, load the history from the event store, scanning backwards (ie, starting from the most recent event) looking for the event documented in the SnapshotRecord, then apply the subsequent events in order.

The SnapshotRepository itself could be a key-value store (at most one record for any given id), but a relational database with blob support will work fine too

select *  from snapshots s  where id = ?  order by s.total_events desc  limit 1 

The snapshot projector and the repository are tightly coupled -- they need to agree on what the state of the entity should be for all possible histories, they need to agree how to de/re-hydrate the memento, and they need to agree which id will be used to locate the snapshot.

The tight coupling also means that you don't need to worry particularly about the schema for the memento; a byte array will be fine.

They don't, however, need to agree with previous incarnations of themselves. Snapshot Projector 2.0 discards/ignores any snapshots left behind by Snapshot Projector 1.0 -- the snapshot store is just a cache after all.

i'm designing an application that will probably generate millions event a day. what can we do if we need to rebuild a view 6 month later

One of the more compelling answers here is to model time explicitly. Do you have one entity that lives for six months, or do you have 180+ entities that each live for one day? Accounting is a good domain to reference here: at the end of the fiscal year, the books are closed, and the next year's books are opened with the carryover.

Yves Reynhout frequently talks about modeling time and scheduling; Evolving a Model may be a good starting point.

like image 99
VoiceOfUnreason Avatar answered Sep 22 '22 02:09

VoiceOfUnreason