Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why should the event store be on the write side?

Event sourcing is pitched as a bonus for a number of things, e.g. event history / audit trail, complete and consistent view regeneration, etc. Sounds great. I am a fan. But those are read-side implementation details, and you could accomplish the same by moving the event store completely to the read side as another subscriber.. so why not?

Here's some thoughts:

  1. The views/denormalizers themselves don't care about an event store. They just handle events from the domain.
  2. Moving the event store to the read side still gives you event history / audit
  3. You can still regenerate your views from the event store. Except now it need not be a write model leak. Give him read model citizenship!

Here seems to be one technical argument for keeping it on the write side. This from Greg Young at http://codebetter.com/gregyoung/2010/02/20/why-use-event-sourcing/:

There are however some issues that exist with using something that is storing a snapshot of current state. The largest issue revolves around the fact that you have introduced two models to your data. You have an event model and a model representing current state.

The thing I find interesting about this is the term "snapshot", which more recently has become a distinguished term in event sourcing as well. Introducing an event store on the write side adds some overhead to loading aggregates. You can debate just how much overhead, but it's apparently a perceived or anticipated problem, since there is now the concept of loading aggregates from a snapshot and all events since the snapshot. So now we have... two models again. And not only that, but the snapshotting suggestions I've seen are intended to be implemented as an infrastructure leak, with a background process going over your entire data store to keep things performant.

And after a snapshot is taken, events before the snapshot become 100% useless from the write perspective, except... to rebuild the read side! That seems wrong.

Another performance related topic: file storage. Sometimes we need to attach large binary files to entities. Conceptually, sometimes these are associated with entities, but sometimes they ARE the entities. Putting these in the event store means you have to physically load that data each and every time you load the entity. That's bad enough, but imagine several or hundreds of these in a large aggregate. Every answer I have seen to this is to basically bite the bullet and pass a uri to the file. That is a cop-out, and undermines the distributed system.

Then there's maintenance. Rebuilding views requires a process involving the event store. So now every view maintenance task you ever write further binds your write model into using the event store.. forever.

Isn't the whole point of CQRS that the use cases around the read model and write model are fundamentally incompatible? So why should we put read model stuff on the write side, sacrificing flexibility and performance, and coupling them back up again. Why spend the time?

So all in all, I am confused. In all respects from where I sit, the event store makes more sense as a read model detail. You still achieve the many benefits of keeping an event store, but you don't over-abstract write side persistence, possibly reducing flexibility and performance. And you don't couple your read/write side back up by leaky abstractions and maintenance tasks.

So could someone please explain to me one or more compelling reasons to keep it on the write side? Or alternatively, why it should NOT go on the read side as a maintenance/reporting concern? Again, I'm not questioning the usefulness of the store. Just where it should go :)

like image 786
Jeremy Rosenberg Avatar asked May 14 '12 17:05

Jeremy Rosenberg


2 Answers

This is a long dead question that someone pointed me to. There are quite a few reasons why its better to store events on the write side.

From my understanding the architecture you are talking about is a very common one that I see ... fail. We will store our domain model in a relational database then put out events. You add the twist of them saving the events on the read side in an event store. This will likely lead to a mess.

The first issue you will run into is in the publishing of your events. What happens when I save to the database and publish to say MSMQ (I die in the middle). So DTC gets introduced between them. This is a huge thing to bring in, distributed transactions should be avoided like the plague. It is also quite inefficient as I am probably making the data durable twice (once to queue once to database). This will limit system throughput by a lot (DTC benchmarks of 200-300 messages/second are common, with events only 20-30k/second is common).

Some work around the need for DTC by putting a table in their database that has the events and operates as a queue. This will avoid the need for DTC however this will still run into the next issue.

What happens when you have a bug? I know you would never write buggy code but one of the Jrs/maintenance developers later working with the project. As an example what happens when the domain object change and the event raised do not match? Say you set State on your domain object to "LA" (hardcoded) but you properly set State on the event to cmd.State ("CT").

How will you detect such errors are occurring? The biggest problem with what is being discussed is that there are now two sources of "truth" there is the database on the write side and the event stream coming out. There is no way to prove that they are equivalent. This will cause all sorts of weird bugs down the line.

like image 138
Greg Young Avatar answered Jan 02 '23 04:01

Greg Young


I think this is really an excellent question. Treating your aggregate as a sequence of events is useful in its own right on the write side, making command retries and the like easier. But I agree that it seems upsetting to work to create your events, then have to make yet another model of your object for persistence if you need this snapshotting performance improvement.

A system where your aggregates only stored snapshots, but sent events to the read-model for projection into read models would I think be called "CQRS", just not "Event Sourcing". If you kept the events around for re-projection, I guess you'd have a system that was very much both.

But then wouldn't you have three definitions? One for persisting your aggregates, one for communicating state changes, and any number more for answering queries?

In such a system it would be tempting to start answering queries by loading your aggregates and asking them questions directly. While this isn't forbidden by any means, it does tend to start causing those aggregates to accrete functionality they might not otherwise need, not to mention complicating threading and transactions.

like image 37
Sebastian Good Avatar answered Jan 02 '23 05:01

Sebastian Good