Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Concurrent writes for event sourcing on top of Kafka

Tags:

I've been considering to use Apache Kafka as the event store in an event sourcing configuration. The published events will be associated to specific resources, delivered to a topic associated to the resource type and sharded into partitions by resource id. So for instance a creation of a resource of type Folder and id 1 would produce a FolderCreate event that would be delivered to the "folders" topic in a partition given by sharding the id 1 across the total number of partitions in the topic. Even though I don't know how to handle concurrent events that make the log inconsistent.

The simplest scenario would be having two concurrent actions that can invalidate each other such as one to update a folder and one to destroy that same folder. In that case the partition for that topic could end up containing the invalid sequence [FolderDestroy, FolderUpdate]. That situation is often fixed by versioning the events as explained here but Kafka does not support such feature.

What can be done to ensure the consistency of the Kafka log itself in those cases?

like image 750
Jesuspc Avatar asked Jun 04 '17 20:06

Jesuspc


People also ask

Can multiple producers write to same Kafka topic?

Kafka is able to seamlessly handle multiple producers that are using many topics or the same topic. The consumer subscribes to one or more topics and reads the messages.

Should you put several event types in the same Kafka topic?

If you are using a data encoding such as JSON, without a statically defined schema, you can easily put many different event types in the same topic. However, if you are using a schema-based encoding such as Avro, a bit more thought is needed to handle multiple event types in a single topic.

Can we have multiple records with the same key in Kafka?

By default, Kafka producer relies on the key of the record to decide to which partition to write the record. For two records with the same key, the producer will always choose the same partition.

How many messages per second can Kafka handle?

How many messages can Apache Kafka® process per second? At Honeycomb, it's easily over one million messages.


1 Answers

I think it's probably possible to use Kafka for event sourcing of aggregates (in the DDD sense), or 'resources'. Some notes:

  1. Serialise writes per partition, using a single process per partition (or partitions) to manage this. Ensure you send messages serially down the same Kafka connection, and use ack=all before reporting success to the command sender, if you can't afford rollbacks. Ensure the producer process keeps track of the current successful event offset/version for each resource, so it can do the optimistic check itself before sending the message.
  2. Since a write failure might be returned even if the write actually succeeded, you need to retry writes and deal with deduplication by including an ID in each event, say, or reinitialize the producer by re-reading (recent messages in) the stream to see whether the write actually worked or not.
  3. Writing multiple events atomically - just publish a composite event containing a list of events.
  4. Lookup by resource id. This can be achieved by reading all events from a partition at startup (or all events from a particular cross-resource snapshot), and storing the current state either in RAM or cached in a DB.

https://issues.apache.org/jira/browse/KAFKA-2260 would solve 1 in a simpler way, but seems to be stalled.

Kafka Streams appears to provide a lot of this for you. For example, 4 is a KTable, which you can have your event producer use one to work out whether an event is valid for the current resource state before sending it.

like image 64
TomW Avatar answered Oct 11 '22 14:10

TomW