I am trying to implement an Event Sourcing system using Kafka and have run into the following issue. During a new user sign-up I want to check if the username the user provided is already taken. However, consider the case where 2 users are trying to sign-up at the same time providing the same username.
In my understanding of how ES works the controller that processes the sign-up request will check if the request is valid, it will then send a new event (e.g. NewUser
) to Kafka, and finally that event will be picked up by another controller which will persist it in a materialized view (e.g. Postgres DB). The problem is that the validation of the request is done against the materialized view but the actual persistence to it happens later. So because the 2 requests are being processed in parallel (by different service instances) they might both pass the validation, resulting in 2 NewUser
messages. However, when the second controller tries to persist those 2 NewUser
messages in the database saving the second event will fail because of the violation of the uniqueness constraint for the username.
Any ideas on how to address this?
Thanks.
In particular, I would like to verify whether the following are accepted approaches to the problem:
CQRS is implemented by a separation of responsibilities between commands and queries, and event sourcing is implemented by using the sequence of events to track changes in data.
Event sourcing is an alternative to traditional persistence strategies, with the purpose of keeping history. Event-driven architecture is a distributed asynchronous architecture pattern used to produce highly scalable applications.
Summary. In Event Sourcing, an Aggregate has its internal state, which is a projection of a single fine-grained event stream. On the other hand - the Write Stack Projections can subscribe to streams containing all (or subset of all) events, and be used to create read models or support process managers.
The event sourcing pattern is typically used with the CQRS pattern to decouple read from write workloads, and optimize for performance, scalability, and security. Data is stored as a series of events, instead of direct updates to data stores.
Initial validation against the materialized view won't be enough in most scenarios where you have constraints. There can always be some relevant events haven't been materialized yet. There are two main concurrency control approaches to ensure that correct results are generated:
1. Pessimistic approach: If you want to validate constraints before you publish an event, you need to lock relevant resources (entity, aggregate or data set). The locking means your services must not be able to publish events on these resources. After this point, to get the current state of your data:
2. Optimistic approach: In this approach, you perform your validations after publishing events. To achieve this, you need to implement a feedback mechanism. The process which consumes events and performs validations should be able to publish validation results. You can perform the validations in-memory when possible. Otherwise, you can rely on your materialized data store.
Martin Kleppman talks about a two-step solution for exactly the same problem here and in his book. In this solution, there are two topics: "claims" and "registrations". First, you publish a claim to take the username, then try to write it to the database, and finally publish the result to the registrations topic. At conceptual level, it follows the same steps in the second approach you have mentioned. In validation step, it avoids implementing validation logic and keeping secondary indexes in memory by relying on the database.
During a new user sign-up I want to check if the username the user provided is already taken.
You may want to review Greg Young's essay on Set Validation.
In my understanding of how ES works the controller that processes the sign-up request will check if the request is valid, it will then send a new event (e.g. NewUser) to Kafka, and finally that event will be picked up by another controller which will persist it in a materialized view (e.g. Postgres DB).
That's a little bit different from the usual arrangement. (You may also want to review Greg's talk on polyglot data.)
Suppose we begin with two writers; that's fine, but if there is going to be a single point of truth, then you are going to need synchronization somewhere.
The usual arrangement is to use a form of optimistic concurrency; when processing a request, you reserve a copy of your original state, then you do your calculation, and finally you send the book of record a `replace(originalState,newState)'.
So at this point, we have two writes racing toward the book of record
replace(red,green)
replace(red,blue)
At the book of record, the writes are processed in series.
[...,replace(red,blue)...,replace(red,green)]
So when the book of record processes replace(red,blue)
, it performs a check that yes, the state is currently red, and swaps in blue. Later, when the book of record tries to process replace(red,green)
, the book of record performs the check, which fails because the state is no longer red.
So one of the writes has succeeded, and the other fails; the latter can propagate the failure outwards, or retry, or..., precisely what depends on the specific mechanics in question. A retry should mean, of course, reload the "original state", at which point the model would discover that some previous edit already claimed the username.
Any ideas on how to address this?
Single writer per stream makes the rest of the problem pretty simple, by eliminating the ambiguity introduced by having multiple in memory copies of the model.
Multiple writers using a synchronous write to the durable store is probably the most common design. It requires an event store that understands the idea of writing to a specific location in a stream -- aka "expected version".
You can perform an asynchronous write, and then start doing other work until you get an acknowledgement that the write succeeded (or not, or until you time out, or)....
There's no magic -- if you want uniqueness (or any other sort of invariant enforcement, for that matter), then everybody needs to agree on a single authority, and anybody else who wants to propose a change won't know if it has been accepted without getting word back from the authority, and needs to be prepared for a rejected proposal.
(Note: this shouldn't be a surprise -- if you were using a traditional design with current state stored in a RDBMS, then your authority would be a user table in the database, with a uniqueness constraint on the username column, and the race would be between the two insert statements trying to finish their transaction first....)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With