Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How should I serialize domain model snapshots for event sourcing

We are building an application using the LMAX Disruptor. When using Event Sourcing, you often want to persist periodic snapshots of your domain model (some people call this the Memory Image pattern).

I need a better solution than what we are currently using to serialize our domain model when taking a snapshot. I want to be able to "pretty-print" this snapshot in a readable format for debugging, and I want to simplify snapshot schema migration.

Currently, we are using Googles' Protocol Buffers to serialize our domain model to a file. We chose this solution, because protocol buffers are more compact than XML / JSON, and using a compact binary format seemed like a good idea to serialize a big Java domain model.

The problem is, Protocol Buffers were designed for relatively small messages, and our domain model is quite big. So the domain model does not fit in one big hierarchical protobuf message, and we end up serializing various protobuf messages to a file, like this:

for each account {
    write simple account fields (id, name, description) as one protobuf message
    write number of user groups
    for each user group {
        convert user group to protobuf message, and serialize it
    }
    for each user {
        convert user to protobuf message, and serialize it
    }
    for each sensor {
        convert sensor to protobuf message, and serialize it
    }
    ...
}

This is annoying, because manipulating a stream of heterogenous protobuf messages is complicated. It would be a lot easier if we had one big protobuf message that contained all of our domain model, like this:

public class AggregateRoot {
    List<Account> accounts;
}

--> convert to big hierarchical protobuf message using some mapping code:

message AggregateRootMessage {
    repeated AccountMessage accounts = 1;
}

--> persist this big message to a file

If we do this, it's easy to prettyprint a snapshot: simply read the big protobuf message, then prettyprint it using protobuf's TextFormat. With our current approach, we need to read the various protobuf messages one by one, and pretty-print them, which is harder, since the order of the protobuf messages in the stream depends on the current snapshot schema, so our pretty-printing tool needs to be aware of that.

I also need a tool to migrate snapshots to the new snapshot schema when our domain model evolves. I'm still working on this tool, but it's hard, because I have to deal with a stream of various protobuf messages, instead of dealing with just one big message. If it were just one big message, I could: - take the snapshot file - parse the file as a big Java protobuf message, using the .proto schema for the previous snapshot version - convert this big protobuf message into a big protobuf message for the new version, using Dozer and some mapping code - write this new protobuf message in a new file, using the .proto schema for the new version

But since I am dealing with a stream of protobuf messages of various types, my tool needs to handle this stream in the correct order.

So, yeah... I guess my questions are:

  • Do you know any serialization tool that can serialize a big domain model into a file, without protobuf's limitations, possibly using streaming to avoid OutOfMemorryErrors?

  • If you use event sourcing or memory images, what do you use to serialize your domain model? JSON? XML? Protobuf? Something else?

  • Are we doing it wrong? Do you have any suggestions?

like image 269
Etienne Neveu Avatar asked May 31 '13 13:05

Etienne Neveu


People also ask

What is snapshot in Event Sourcing?

Snapshots are a way to reduce the amount of events you need to fetch and apply when instantiating your aggregate.

Is Protobuf faster than Avro?

According to JMH, Protobuf can serialize some data 4.7 million times in a second where as Avro can only do 800k per second.

Is Protobuf faster than JSON?

TL;DR — encoding and decoding string-intensive data in JavaScript is faster with JSON than it is with protobuf. When you have structured data in JavaScript, which needs to be sent over the network (for another microservice for example) or saved into a storage system, it first needs to be serialized.

How does Protobuf serialize?

The Protobuf serialization mechanism is given through the protoc application, this compiler will parse the . proto file and will generate as output, source files according to the configured language by its arguments, in this case, C++. You can also obtain more information about, reading the section compiler invocation.


3 Answers

The way I would define a solution to the problem is by separating the 'specification' from 'transfer syntax'. Now, that we have defined our message specifications we need to work on wire-line representation which may support different needs varying between machine efficiency and human readability say;

  • binary mode - least verbose but not human readable
  • character - that represents commands and params is more readable and also provides robust storage
  • clear text - say for debug purpose

The solution must provide switchable behavior. We can base our solution on ASN.1 and related tool-set which is both language and platform agnostic, although, a rich ecosystem is available with Java (Bouncycastle et al). We have used it with fairly large message blobs over the network with no known issues :)

Hope it gives some pointers.

like image 84
Nitin Tripathi Avatar answered Oct 13 '22 00:10

Nitin Tripathi


Just from the top of my head (without actually knowing how big your snapshot files would get):

Have you tried Google's Gson JSON library? It seems to provide both versioning (https://sites.google.com/site/gson/gson-user-guide#TOC-Versioning-Support) and streaming (https://sites.google.com/site/gson/streaming) for JSON-based documents.

And now that we are talking JSON, how about storing the snapshots in e.g. CouchDB (http://en.wikipedia.org/wiki/CouchDB) documents?

JSON may take a bit more space but it is readable.

like image 45
Jukka Avatar answered Oct 13 '22 00:10

Jukka


The best list of options I've see is here: https://github.com/eishay/jvm-serializers/wiki. You'll have to do some quick tests to see what's fast for you. Regarding streaming, I'd have to look through each of the libraries in this list.

Not sure I understand the pretty printing problem. It doesn't seem necessary to solve efficient serialization and pretty printing with the same technology, since surely pretty printing doesn't have to be done super efficiently. If you already have a javabean representation, then I'd probably reload the data into beans, and then use Jackson to print the data to JSON.

Regarding versioning/migrations, have you already solved the problem of how to start a new version of the code that's running the new domain model? If yes, then why not just create a new snapshot after the new version starts?

like image 33
jtoberon Avatar answered Oct 12 '22 22:10

jtoberon