I'm currently evaluating options for designing/implementing Event Sourcing + CQRS architectural approach to system design. Since we want to use Apache Kafka for other aspects (normal pub-sub messaging + stream processing), the next logical question would be, "Can we use the Apache Kafka store as event store for CQRS"?, or more importantly would that be a smart decision? Right now I'm unsure about this. This source seems to support it: https://www.confluent.io/blog/okay-store-data-apache-kafka/ This other source recommends against that: https://medium.com/serialized-io/apache-kafka-is-not-for-event-sourcing-81735c3cf5c In my current tests/experiments, I'm having problems similar to those described by the 2nd source, those are: <ol> <li> recomposing an entity: Kafka doesn't seem to support fast retrieval/searching of specific events within a topic (for example: all commands related to an order's history - necessary for the reconstruction of the entity's instance, seems to require the scan of all the topic's events and filter only those matching some entity instance identificator, which is a no go). [This other person seems to have arrived to a similar conclusion: Query Kafka topic for specific record -- that is, it is just not possible (without relying on some hacky trick)] </li> <li> - write consistency: Kafka doesn't support transactional atomicity on their store, so it seems a common practice to just put a DB with some locking approach (usually optimistic locking) before asynchronously exporting the events to the Kafka queue (I can live with this though, the first problem is much more crucial to me).</li> <li> The partition problem: On the Kafka documentation, it is mentioned that "order guarantee", exists only within a "Topic's partition". At the same time they also say that the partition is the basic unit of parallelism, in other words, if you want to parallelize work, spread the messages across partitions (and brokers of course). But this is a problem, because an "Event store" in an event sourced system needs the order guarantee, so this means I'm forced to use only 1 partition for this use case if I absolutely need the order guarantee. Is this correct?</li> </ol> Even though this question is a bit open, It really is like that: Have you used Kafka as your main event store on an event sourced system? How have you dealt with the problem of recomposing entity instances out of their command history (given that the topic has millions of entries scanning all the set is not an option)? Did you use only 1 partition sacrificing potential concurrent consumers (given that the order guarantee is restricted to a specific topic partition)? Any specific or general feedback would the greatly appreciated, as this is a complex topic with several considerations. Thanks in advance. EDIT There was a similar discussion 6 years ago here: Using Kafka as a (CQRS) Eventstore. Good idea? Consensus back then was also divided, and a lot of people that suggest this approach is convenient, mention how Kafka deals natively with huge amounts of real time data. Nevertheless the problem (for me at least) isn't related to that, but is more related to how inconvenient are Kafka's capabilities to rebuild an Entity's state- Either by modeling topics as Entities instances (where the exponential explosion in topics amount is undesired), or by modelling topics es entity Types (where amounts of events within the topic make reconstruction very slow/unpractical).

your understanding is mostly correct: <ol> <li>kafka has no search. definitely not by key. there's a seek to timestamp, but its imperfect and not good for what youre trying to do.</li> <li>kafka actually supports a limited form of transactions (see exactly once) these days, although if you interact with any other system outside of kafka they will be of no use.</li> <li>the unit of anything in kafka (event ordering, availability, replication) is a partition. there are no guarantees across partitions of the same topic.</li> </ol> all these dont stop applications from using kafka as the source of truth for their state, so long as: <ol> <li>your problem can be "sharded" into topic partitions so you dont care about order of events across partitions</li> <li>youre willing to "replay" an entire partition if/when you lose your local state as bootstrap.</li> <li>you use log compacted topics to try and keep a bound on their size (because you will need to replay them to bootstrap, see above point)</li> </ol> both samza and (IIUC) kafka-streams back their state stores with log-compacted kafka topics. internally to kafka offset and consumer group management is stored as a log compacted topic with brokers holding a "materialized view" in memory - when ownership of a partition of <code>__consumer_offsets</code> moves between brokers the new leader replays the partition to rebuild this view.

Is it ok to use Apache Kafka "infinite retention policy" as a base for an Event sourced system with CQRS?

Tags:

apache-kafka

cqrs

event-sourcing

apache-kafka-streams

eventsource

I'm currently evaluating options for designing/implementing Event Sourcing + CQRS architectural approach to system design. Since we want to use Apache Kafka for other aspects (normal pub-sub messaging + stream processing), the next logical question would be, "Can we use the Apache Kafka store as event store for CQRS"?, or more importantly would that be a smart decision?

Right now I'm unsure about this. This source seems to support it: https://www.confluent.io/blog/okay-store-data-apache-kafka/

This other source recommends against that: https://medium.com/serialized-io/apache-kafka-is-not-for-event-sourcing-81735c3cf5c

In my current tests/experiments, I'm having problems similar to those described by the 2nd source, those are:

recomposing an entity: Kafka doesn't seem to support fast retrieval/searching of specific events within a topic (for example: all commands related to an order's history - necessary for the reconstruction of the entity's instance, seems to require the scan of all the topic's events and filter only those matching some entity instance identificator, which is a no go). [This other person seems to have arrived to a similar conclusion: Query Kafka topic for specific record -- that is, it is just not possible (without relying on some hacky trick)]
- write consistency: Kafka doesn't support transactional atomicity on their store, so it seems a common practice to just put a DB with some locking approach (usually optimistic locking) before asynchronously exporting the events to the Kafka queue (I can live with this though, the first problem is much more crucial to me).
The partition problem: On the Kafka documentation, it is mentioned that "order guarantee", exists only within a "Topic's partition". At the same time they also say that the partition is the basic unit of parallelism, in other words, if you want to parallelize work, spread the messages across partitions (and brokers of course). But this is a problem, because an "Event store" in an event sourced system needs the order guarantee, so this means I'm forced to use only 1 partition for this use case if I absolutely need the order guarantee. Is this correct?

Even though this question is a bit open, It really is like that: Have you used Kafka as your main event store on an event sourced system? How have you dealt with the problem of recomposing entity instances out of their command history (given that the topic has millions of entries scanning all the set is not an option)? Did you use only 1 partition sacrificing potential concurrent consumers (given that the order guarantee is restricted to a specific topic partition)?

Any specific or general feedback would the greatly appreciated, as this is a complex topic with several considerations.

Thanks in advance.

EDIT There was a similar discussion 6 years ago here: Using Kafka as a (CQRS) Eventstore. Good idea? Consensus back then was also divided, and a lot of people that suggest this approach is convenient, mention how Kafka deals natively with huge amounts of real time data. Nevertheless the problem (for me at least) isn't related to that, but is more related to how inconvenient are Kafka's capabilities to rebuild an Entity's state- Either by modeling topics as Entities instances (where the exponential explosion in topics amount is undesired), or by modelling topics es entity Types (where amounts of events within the topic make reconstruction very slow/unpractical).

932

asked Nov 08 '19 09:11

tony _008

1 Answers

your understanding is mostly correct:

kafka has no search. definitely not by key. there's a seek to timestamp, but its imperfect and not good for what youre trying to do.
kafka actually supports a limited form of transactions (see exactly once) these days, although if you interact with any other system outside of kafka they will be of no use.
the unit of anything in kafka (event ordering, availability, replication) is a partition. there are no guarantees across partitions of the same topic.

all these dont stop applications from using kafka as the source of truth for their state, so long as:

your problem can be "sharded" into topic partitions so you dont care about order of events across partitions
youre willing to "replay" an entire partition if/when you lose your local state as bootstrap.
you use log compacted topics to try and keep a bound on their size (because you will need to replay them to bootstrap, see above point)

both samza and (IIUC) kafka-streams back their state stores with log-compacted kafka topics. internally to kafka offset and consumer group management is stored as a log compacted topic with brokers holding a "materialized view" in memory - when ownership of a partition of __consumer_offsets moves between brokers the new leader replays the partition to rebuild this view.

153

answered Sep 22 '22 15:09

radai

Related questions
                            
                                Python librdkafka producer perform against the native Apache Kafka Producer
                            
                                How to transfer data from S3 bucket to Kafka
                            
                                Kafka Stream Exception: GroupAuthorizationException
                            
                                Ensuring that all messages have been read from Kafka topic using REST Proxy
                            
                                What is the correct way to commit after processing each record retrieved from Kafka?
                            
                                Using Kafka-Go, why am I seeing what appears to be batching reads/writes? Is there a config I am missing?
                            
                                Scaladoc (and Javadoc) for Kafka [closed]
                            
                                Kafka Schema Registry error: Failed to write Noop record to kafka store
                            
                                Kafka consumer: fetching topic metadata for topics from broker [ArrayBuffer(id:0,host:user-Desktop,port:9092)] failed
                            
                                Kafka with Docker dynamic advertised_host_name
                            
                                Is there any way to maintain message ordering between partitions of a kafka topic with a single consumer?
                            
                                Performing an asynchronous transformation within a Kafka Stream
                            
                                Read Kafka topic in a Spark batch job
                            
                                What causes "unknown resolver null" in Spark Kafka Connector?
                            
                                Kafka Confluent error - java.net.BindException: Address already in use
                            
                                Apache Kafka 1.0.0 Streams API Multiple Multilevel groupby
                            
                                Configure Kafka to expose JMX only on 127.0.0.1
                            
                                Spring Boot / Kafka Json Deserialization - Trusted Packages
                            
                                Spring Kafka SeekToCurrentErrorHandler Find Out Which Record Has Failed
                            
                                Kafka Connect | Cannot complete request because of a conflicting operation

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With