Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Event Sourcing: concurrently creating conflicting events

I am trying to implement an Event Sourcing system using Kafka and have run into the following issue. During a new user sign-up I want to check if the username the user provided is already taken. However, consider the case where 2 users are trying to sign-up at the same time providing the same username.

In my understanding of how ES works the controller that processes the sign-up request will check if the request is valid, it will then send a new event (e.g. NewUser) to Kafka, and finally that event will be picked up by another controller which will persist it in a materialized view (e.g. Postgres DB). The problem is that the validation of the request is done against the materialized view but the actual persistence to it happens later. So because the 2 requests are being processed in parallel (by different service instances) they might both pass the validation, resulting in 2 NewUser messages. However, when the second controller tries to persist those 2 NewUser messages in the database saving the second event will fail because of the violation of the uniqueness constraint for the username.

Any ideas on how to address this?

Thanks.

UPDATE:

In particular, I would like to verify whether the following are accepted approaches to the problem:

  1. use the username as the userId (restrictive)
  2. send an event to a topic partitioned by username and when validation is done send an event to another topic
like image 889
George Avatar asked May 12 '17 18:05

George


People also ask

What is the difference between CQRS and Event Sourcing?

CQRS is implemented by a separation of responsibilities between commands and queries, and event sourcing is implemented by using the sequence of events to track changes in data.

Is Event Sourcing asynchronous?

Event sourcing is an alternative to traditional persistence strategies, with the purpose of keeping history. Event-driven architecture is a distributed asynchronous architecture pattern used to produce highly scalable applications.

What is Event Sourcing aggregate?

Summary. In Event Sourcing, an Aggregate has its internal state, which is a projection of a single fine-grained event stream. On the other hand - the Write Stack Projections can subscribe to streams containing all (or subset of all) events, and be used to create read models or support process managers.

What is the Event Sourcing pattern?

The event sourcing pattern is typically used with the CQRS pattern to decouple read from write workloads, and optimize for performance, scalability, and security. Data is stored as a series of events, instead of direct updates to data stores.


2 Answers

Initial validation against the materialized view won't be enough in most scenarios where you have constraints. There can always be some relevant events haven't been materialized yet. There are two main concurrency control approaches to ensure that correct results are generated:

1. Pessimistic approach: If you want to validate constraints before you publish an event, you need to lock relevant resources (entity, aggregate or data set). The locking means your services must not be able to publish events on these resources. After this point, to get the current state of your data:

  • You can wait until all events published before locking are materialized.
  • You can read current state from the database and apply events on it in a separate process.

2. Optimistic approach: In this approach, you perform your validations after publishing events. To achieve this, you need to implement a feedback mechanism. The process which consumes events and performs validations should be able to publish validation results. You can perform the validations in-memory when possible. Otherwise, you can rely on your materialized data store.

Martin Kleppman talks about a two-step solution for exactly the same problem here and in his book. In this solution, there are two topics: "claims" and "registrations". First, you publish a claim to take the username, then try to write it to the database, and finally publish the result to the registrations topic. At conceptual level, it follows the same steps in the second approach you have mentioned. In validation step, it avoids implementing validation logic and keeping secondary indexes in memory by relying on the database.

like image 75
gorkem Avatar answered Sep 30 '22 15:09

gorkem


During a new user sign-up I want to check if the username the user provided is already taken.

You may want to review Greg Young's essay on Set Validation.

In my understanding of how ES works the controller that processes the sign-up request will check if the request is valid, it will then send a new event (e.g. NewUser) to Kafka, and finally that event will be picked up by another controller which will persist it in a materialized view (e.g. Postgres DB).

That's a little bit different from the usual arrangement. (You may also want to review Greg's talk on polyglot data.)

Suppose we begin with two writers; that's fine, but if there is going to be a single point of truth, then you are going to need synchronization somewhere.

The usual arrangement is to use a form of optimistic concurrency; when processing a request, you reserve a copy of your original state, then you do your calculation, and finally you send the book of record a `replace(originalState,newState)'.

So at this point, we have two writes racing toward the book of record

replace(red,green)
replace(red,blue)

At the book of record, the writes are processed in series.

[...,replace(red,blue)...,replace(red,green)]

So when the book of record processes replace(red,blue), it performs a check that yes, the state is currently red, and swaps in blue. Later, when the book of record tries to process replace(red,green), the book of record performs the check, which fails because the state is no longer red.

So one of the writes has succeeded, and the other fails; the latter can propagate the failure outwards, or retry, or..., precisely what depends on the specific mechanics in question. A retry should mean, of course, reload the "original state", at which point the model would discover that some previous edit already claimed the username.

Any ideas on how to address this?

Single writer per stream makes the rest of the problem pretty simple, by eliminating the ambiguity introduced by having multiple in memory copies of the model.

Multiple writers using a synchronous write to the durable store is probably the most common design. It requires an event store that understands the idea of writing to a specific location in a stream -- aka "expected version".

You can perform an asynchronous write, and then start doing other work until you get an acknowledgement that the write succeeded (or not, or until you time out, or)....

There's no magic -- if you want uniqueness (or any other sort of invariant enforcement, for that matter), then everybody needs to agree on a single authority, and anybody else who wants to propose a change won't know if it has been accepted without getting word back from the authority, and needs to be prepared for a rejected proposal.

(Note: this shouldn't be a surprise -- if you were using a traditional design with current state stored in a RDBMS, then your authority would be a user table in the database, with a uniqueness constraint on the username column, and the race would be between the two insert statements trying to finish their transaction first....)

like image 20
VoiceOfUnreason Avatar answered Sep 30 '22 15:09

VoiceOfUnreason