Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How should Event Sourcing event handlers be hosted to construct a read model?

There are various example applications and frameworks that implement a CQRS + Event Sourcing architecture and most describe use of an event handler to create a denormalized view from domain events stored in an event store.

One example of hosting this architecture is as a web api that accepts commands to the write side and supports querying the denormalized views. This web api is likely scaled out to many machines in a load balanced farm.

My question is where are the read model event handlers hosted?

Possible scenarios:

  1. Hosted in a single windows service on a separate host. If so, wouldn't that create a single point of failure? This probably complicates deployment too but it does guarantee a single thread of execution. Downside is that the read model could exhibit increased latency.

  2. Hosted as part of the web api itself. If I'm using EventStore, for example, for the event storage and event subscription handling, will multiple handlers (one in each web farm process) be fired for each single event and thereby cause contention in the handlers as they try to read/write to their read store? Or are we guaranteed for a given aggregate instance all its events will be processed one at a time in event version order?

I'm leaning towards scenario 2 as it simplifies deployment and also supports process managers that need to also listen to events. Same situation though as only one event handler should be handling a single event.

Can EventStore handle this scenario? How are others handling processing of events in eventually consistent architectures?

EDIT:

To clarify, I'm talking about the process of extracting event data into the denormalized tables rather than the reading of those tables for the "Q" in CQRS.

I guess what I'm looking for are options for how we "should" implement and deploy the event processing for read models/sagas/etc that can support redundancy and scale, assuming of course the processing of events is handled in an idempotent way.

I've read of two possible solutions for processing data saved as events in an event store but I don't understand which one should be used over another.

Event bus

An event bus/queue is used to publish messages after an event is saved, usually by the repository implementation. Interested parties (subscribers), such as read models, or sagas/process managers, use the bus/queue "in some way" to process it in an idempotent way.

If the queue is pub/sub this implies that each downstream dependency (read model, sagas, etc) can only support one process each to subscribe to the queue. More than one process would mean each processing the same event and then competing to make the changes downstream. Idempotent handling should take care of consistency/concurrency issues.

If the queue is competing consumer we at least have the possibility of hosting subscribers in each web farm node for redundancy. Though this requires a queue for each downstream dependency; one for sagas/process managers, one for each read model, etc, and so the repository would have to publish to each for eventual consistency.

Subscription/feed

A subscription/feed where interested parties (subscriber) read an event stream on demand and get events from a known checkpoint for processing into a read model.

This looks great for recreating read models if necessary. However, as per the usual pub/sub pattern, it would seem only one subscriber process per downstream dependency should be used. If we register multiple subscribers for the same event stream, one in each web farm node for example, they will all attempt to process and update the same respective read model.

like image 465
Mark Avatar asked Oct 22 '16 07:10

Mark


1 Answers

In our project we use subscription-based projections. The reasons for this are:

  • Committing to the write-side must be transactional and if you use two pieces of infrastructure (event store and message bus), you have to start using DTC or otherwise you risk your events to be saved to the store but not published to the bus, or the other way around, depending on your implementation. DTC and two-phase commits are nasty things and you do not want to go this way
  • Events are usually published in the message bus anyway (we do it via subscriptions too) for event-driven communication between different bounded contexts. If you use message subscribers to update your read model, when you decide to rebuilt the read model, your other subscribers will get these messages too and this will bring the system to invalid state. I think you have thought about this already when saying you must only have one subscriber for each published message type.
  • Message bus consumers have no guarantee on message order and this can bring your read model to mess.
  • Message consumers usually handle retries by sending the message back to the queue, and usually by the end of the queue, for retrying. This means that your events can become heavily out of order. In addition, usually after some number of retries, message consumer gives up on the poison message and puts it to some DLQ. If this would be your projection, this will mean that one update will be ignored whilst others will be processed. This means that your read model will be in inconsistent (invalid) state.

Considering these reasons, we have single-threaded subscription-based projections that can do whatever. You can do different type of projections with own checkpoints, subscribing to the event store using catch-up subscriptions. We host them in the same process as many other things for the sake of simplicity but this process only runs on one machine. Should we want to scale-out this process, we will have to take the subscriptions/projections out. It can easily be done since this part has virtually no dependencies to other modules, except the read model DTOs itself, which can be shared as an assembly anyway.

By using subscriptions you always project events that have been already committed. If something goes wrong with the projections, the write side is definitely the source of truth and remains so, you just need to fix the projection and run it again.

We have two separate ones - one for projecting to the read model and another one for publishing events to the message bus. This construct has proven to work very well.

like image 129
Alexey Zimarev Avatar answered Apr 30 '23 09:04

Alexey Zimarev