Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Event sourcing - why a dedicated event store?

I am trying to implement event sourcing/CQRS/DDD for the first time, mostly for learning purposes, where there is the idea of an event store and a message queue such as Apache Kafka, and you have events flowing from event store => Kafka Connect JDBC/Debezium CDC => Kafka.

I am wondering why there needs to be a separate event store when it sounds like its purpose can be fulfilled by Kafka itself with its main features and log compaction or configuring log retention for permanent storage. Should I store my events in a dedicated store like RDBMS to feed into Kafka or should I feed them straight into Kafka?

enter image description here

like image 910
atkayla Avatar asked Jan 01 '23 21:01

atkayla


2 Answers

Much of the literature on event-sourcing and cqrs comes from the [domain driven design] community; in its earliest form, CQRS was called DDDD... Distributed domain driven design.

One of the common patterns in domain driven design is to have a domain model ensuring the integrity of the data in your durable storage, which is to say, ensuring that there are no internal contradictions...

I am wondering why there needs to be a separate event store when it sounds like its purpose can be fulfilled by Kafka itself with its main features and log compaction or configuring log retention for permanent storage.

So if we want an event stream with no internal contradictions, how do we achieve that? One way is to ensure that only a single process has permission to modify the stream. Unfortunately, that leaves you with a single point of failure -- the process dies, and everything comes to an end.

On the other hand, if you have multiple processes updating the same stream, then you have risk of concurrent writes, and data races, and contradictions being introduced because one writer couldn't yet see what the other one did.

With an RDBMS or an Event Store, we can solve this problem by using transactions, or compare and swap semantics; and attempt to extend the stream with new events is rejected if there has been a concurrent modification.

Furthermore, because of its DDD heritage, it is common for the durable store to be divided into many very fine grained partitions (aka "aggregates"). One single shopping cart might reasonably have four streams dedicated to it.

If Kafka lacks those capabilities, then it is going to be a lousy replacement for an event store. KAFKA-2260 has been open for more than four years now, so we seem to be lacking the first. From what I've been able to discern from the Kakfa literature, it isn't happy about fine grained streams either (although its been a while since I checked, perhaps things have changed).

See also: Jesper Hammarbäck writing about this 18 months ago, and reaching similar conclusions to those expressed here.

like image 70
VoiceOfUnreason Avatar answered Jan 05 '23 16:01

VoiceOfUnreason


Kafka can be used as a DDD event store, but there are some complications if you do so due to the features it is missing.

Two key features that people use with event sourcing of aggregates are:

  1. Load an aggregate, by reading the events for just that aggregate
  2. When concurrently writing new events for an aggregate, ensure only one writer succeeds, to avoid corrupting the aggregate and breaking its invariants.

Kafka can't do either of these currently, since 1 fails since you generally need to have one stream per aggregate type (it doesn't scale to one stream per aggregate, and this wouldn't necessarily be desirable anyway), so there's no way to load just the events for one aggregate, and 2 fails since https://issues.apache.org/jira/browse/KAFKA-2260 has not been implemented.

So you have to write the system in such as way that capabilities 1 and 2 aren't needed. This can be done as follows:

  1. Rather than invoking command handlers directly, write them to streams. Have a command stream per aggregate type, sharded by aggregate id (these don't need permanent retention). This ensures that you only ever process a single command for a particular aggregate at a time.
  2. Write snapshotting code for all your aggregate types
  3. When processing a command message, do the following:
    1. Load the aggregate snapshot
    2. Validate the command against it
    3. Write the new events (or return failure)
    4. Apply the events to the aggregate
    5. Save a new aggregate snapshot, including the current stream offset for the event stream
    6. Return success to the client (via a reply message perhaps)

The only other problem is handling failures (such as the snapshotting failing). This can be handled during startup of a particular command processing partition - it simply needs to replay any events since the last snapshot succeeded, and update the corresponding snapshots before resuming command processing.

Kafka Streams appears to have the features to make this very simple - you have a KStream of commands that you transform into a KTable (containing snapshots, keyed by aggregate id) and a KStream of events (and possibly another stream containing responses). Kafka allows all this to work transactionally, so there is no risk of failing to update the snapshot. It will also handle migrating partitions to new servers, etc. (automatically loading the snapshot KTable into a local RocksDB when this happens).

like image 33
TomW Avatar answered Jan 05 '23 15:01

TomW