Using Kafka as a (CQRS) Eventstore. Good idea?

UPDATE

Kafka documentation:

Event sourcing is a style of application design where state changes are logged as a time-ordered sequence of records. Kafka's support for very large stored log data makes it an excellent backend for an application built in this style.

UPDATE 2

One concern with using Kafka for event sourcing is the number of required topics. Typically in event sourcing, there is a stream (topic) of events per entity (such as user, product, etc). This way, the current state of an entity can be reconstituted by re-applying all events in the stream. Each Kafka topic consists of one or more partitions and each partition is stored as a directory on the file system. There will also be pressure from ZooKeeper as the number of znodes increases.

I keep coming back to this QA. And I did not find the existing answers nuanced enough, so I am adding this one.

TL;DR. Yes or No, depending on your event sourcing usage.

There are two primary kinds of event sourced systems of which I am aware.

Downstream event processors = Yes

In this kind of system, events happen in the real world and are recorded as facts. Such as a warehouse system to keep track of pallets of products. There are basically no conflicting events. Everything has already happened, even if it was wrong. (I.e. pallet 123456 put on truck A, but was scheduled for truck B.) Then later the facts are checked for exceptions via reporting mechanisms. Kafka seems well-suited for this kind of down-stream, event processing application.

In this context, it is understandable why Kafka folks are advocating it as an Event Sourcing solution. Because it is quite similar to how it is already used in, for example, click streams. However, people using the term Event Sourcing (as opposed to Stream Processing) are likely referring to the second usage...

Application-controlled source of truth = No

This kind of application declares its own events as a result of user requests passing through business logic. Kafka does not work well in this case for two primary reasons.

Lack of entity isolation

This scenario needs the ability to load the event stream for a specific entity. The common reason for this is to build a transient write model for the business logic to use to process the request. Doing this is impractical in Kafka. Using topic-per-entity could allow this, except this is a non-starter when there may be thousands or millions of entities. This is due to technical limits in Kafka/Zookeeper.

One of the main reasons to use a transient write model in this way is to make business logic changes cheap and easy to deploy.

Using topic-per-type is recommended instead for Kafka, but this would require loading events for every entity of that type just to get events for a single entity. Since you cannot tell by log position which events belong to which entity. Even using Snapshots to start from a known log position, this could be a significant number of events to churn through if structural changes to the snapshot are needed to support logic changes.

Lack of conflict detection

Secondly, users can create race conditions due to concurrent requests against the same entity. It may be quite undesirable to save conflicting events and resolve them after the fact. So it is important to be able to prevent conflicting events. To scale request load, it is common to use stateless services while preventing write conflicts using conditional writes (only write if the last entity event was #x). A.k.a. Optimistic Concurrency. Kafka does not support optimistic concurrency. Even if it supported it at the topic level, it would need to be all the way down to the entity level to be effective. To use Kafka and prevent conflicting events, you would need to use a stateful, serialized writer (per "shard" or whatever is Kafka's equivalent) at the application level. This is a significant architectural requirement/restriction.

Bonus reason: fitment for problem

added 2021/09/29

Kafka is meant to solve giant-scale data problems and has commensurate overhead to do so. An app-controlled source of truth is a smaller scale, in-depth solution. Using event sourcing to good effect requires crafting events and streams to match the business processes. This usually has a much higher level of detail than would be generally useful to other parts of a system. Consider if your bank statement contained an entry for every step of a bank's internal processes. A single transaction could have many entries before it is confirmed to your account.

When I asked myself the same question as the OP, I wanted to know if Kafka was a scaling option for event sourcing. But perhaps a better question is whether it makes sense for my event sourced solution to operate at a giant scale. I can't speak to every case, but I think often it does not. When this scale enters the picture, the granularity of events tends to be different. And my event sourced system should probably publish higher granularity events to the Kafka cluster rather than use it as storage.

Scale can still be needed for event sourcing. Strategies differ depending on why. Often event streams have a "done" state and can be archived if storage or volume is the issue. Sharding is another option which works especially well for regional- or tenant-isolated scenarios. In less isolated scenarios, when streams are arbitrarily related in a way that can cross shard boundaries, sharding events is still quite easy (partition by stream ID). But things get more complicated for event consumers since events come from different shards and are no longer totally ordered. For example, you can receive transaction events before you receive events describing the accounts involved. Kafka has the same issue since events are only ordered within topics. Ideally you design the consumer so that ordering between streams would not be needed. Otherwise you resort to merging different sources and sorting by time stamp, then an arbitrary tie breaker (like shard ID) if the timestamps are the same. And it becomes important how out-of-sync a server's clock gets.

Summary

Further information

Can you force Kafka to work for an app-controlled source of truth? Sure if you try hard enough and integrate deeply enough. But is it a good idea? No.

Update per comment

The comment has been deleted, but the question was something like: what do people use for event storage then?

It seems that most people roll their own event storage implementation on top of an existing database. For non-distributed scenarios, like internal back-ends or stand-alone products, it is well-documented how to create a SQL-based event store. And there are libraries available on top of a various kinds databases. There is also EventStore, which is built for this purpose.

In distributed scenarios, I've seen a couple of different implementations. Jet's Panther project uses Azure CosmosDB, with the Change Feed feature to notify listeners. Another similar implementation I've heard about on AWS is using DynamoDB with its Streams feature to notify listeners. The partition key probably should be the stream id for best data distribution (to lessen the amount of over-provisioning). However, a full replay across streams in Dynamo is expensive (read and cost-wise). So this impl was also setup for Dynamo Streams to dump events to S3. When a new listener comes online, or an existing listener wants a full replay, it would read S3 to catch up first.

My current project is a multi-tenant scenario, and I rolled my own on top of Postgres. Something like Citus seems appropriate for scalability, partitioning by tentant+stream.

Kafka is still very useful in distributed scenarios. It is a non-trivial problem to expose each service's events to other services. An event store is not built for that typically, but that's precisely what Kafka does well. Each service has its own internal source of truth (could be event storage or otherwise), but listens to Kafka to know what is happening "outside". The service may also post events to Kafka to inform the "outside" of interesting things the service did.

You can use Kafka as event store, but I do not recommend doing so, although it might looks like good choice:

Kafka only guarantees at least once deliver and there are duplicates in the event store that cannot be removed. Update: Here you can read why it is so hard with Kafka and some latest news about how to finally achieve this behavior: https://www.confluent.io/blog/exactly-once-semantics-are-possible-heres-how-apache-kafka-does-it/
Due to immutability, there is no way to manipulate event store when application evolves and events need to be transformed (there are of course methods like upcasting, but...). Once might say you never need to transform events, but that is not correct assumption, there could be situation where you do backup of original, but you upgrade them to latest versions. That is valid requirement in event driven architectures.
No place to persist snapshots of entities/aggregates and replay will become slower and slower. Creating snapshots is must feature for event store from long term perspective.
Given Kafka partitions are distributed and they are hard to manage and backup compare with databases. Databases are simply simpler :-)

So, before you make your choice you think twice. Event store as combination of application layer interfaces (monitoring and management), SQL/NoSQL store and Kafka as broker is better choice than leaving Kafka handle both roles to create complete feature full solution.

Event store is complex service which requires more than what Kafka can offer if you are serious about applying Event sourcing, CQRS, Sagas and other patterns in event driven architecture and stay high performance.

Feel free to challenge my answer! You might not like what I say about your favorite broker with lots of overlapping capabilities, but still, Kafka wasn't designed as event store, but more as high performance broker and buffer at the same time to handle fast producers versus slow consumers scenarios, for example.

Please look at eventuate.io microservices open source framework to discover more about the potential problems: http://eventuate.io/

Update as of 8th Feb 2018

I don't incorporate new info from comments, but agree on some of those aspects. This update is more about some recommendations for microservice event-driven platform. If you are serious about microservice robust design and highest possible performance in general I will provide you with few hints you might be interested.

Don't use Spring - it is great (I use it myself a lot), but is heavy and slow at the same time. And it is not microservice platform at all. It's "just" a framework to help you implement one (lot of work behind this..). Other frameworks are "just" lightweight REST or JPA or differently focused frameworks. I recommend probably best-in-class open source complete microservice platform available which is coming back to pure Java roots: https://github.com/networknt

If you wonder about performance, you can compare yourself with existing benchmark suite. https://github.com/networknt/microservices-framework-benchmark

Don't use Kafka at all :-)) It is half joke. I mean while Kafka is great, it is another broker centric system. I think future is in broker-less messaging systems. You might be surprised but there are faster then Kafka systems :-), of course you must get down to lower level. Look at Chronicle.
For Event store I recommend superior Postgresql extension called TimescaleDB, which focuses on high performance timeseries data processing (events are timeseries) in large volume. Of course CQRS, Event sourcing (replay, etc. features) are built in light4j framework out of the box which uses Postgres as low storage.
For messaging try to look at Chronicle Queue, Map, Engine, Network. I mean get rid of this old-fashioned broker centric solutions and go with micro messaging system (embedded one). Chronicle Queue is actually even faster than Kafka. But I agree it is not all in one solution and you need to do some development otherwise you go and buy Enterprise version(paid one). In the end the effort to build from Chronicle your own messaging layer will be paid by removing the burden of maintaining the Kafka cluster.

Yes, you can use Kafka as an event store. It works quite well, especially with the introduction of Kafka Streams, which provides a Kafka-native way to process your events into accumulated state that you can query.

Regarding:

Ability to replay the eventlog which allows the ability for new subscribers to register with the system after the fact.

This can be tricky. I covered that in detail here: https://stackoverflow.com/a/48482974/741970

Related questions
                            
                                Business rule validators and command handler in CQRS
                            
                                Value Objects in CQRS - where to use
                            
                                Alternatives to many-to-many relationships with CQRS
                            
                                Relation between command handlers, aggregates, the repository and the event store in CQRS
                            
                                CQRS Commands and Queries - Do they belong in the domain?
                            
                                How do Repositories fit with CQRS?
                            
                                CQRS without Event Sourcing - what are the drawbacks?
                            
                                Real life experience with the Axon Framework [closed]
                            
                                Domain Validation in a CQRS architecture
                            
                                What is the difference between Command + CommandHandler and Service?
                            
                                How to get ID in create when applying CQRS?
                            
                                What are the disadvantages of using Event sourcing and CQRS?
                            
                                CQRS Examples and Screencasts [closed]
                            
                                What is the difference between a saga, a process manager and a document-based approach?
                            
                                When to use the CQRS design pattern?
                            
                                CQRS Event Sourcing: Validate UserName uniqueness
                            
                                Difference between CQRS and CQS
                            
                                Why are commands and events separately represented?
                            
                                CQRS: Command Return Values [closed]
                            
                                Using an RDBMS as event sourcing storage

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Using Kafka as a (CQRS) Eventstore. Good idea?

Tags:

apache-kafka

cqrs

event-sourcing

dddd

People also ask