Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

EventSourcing race condition

Here is the nice article which describes what is ES and how to deal with it.

Everything is fine there, but one image is bothering me. Here it is

ES example

I understand that in distributed event-based systems we are able to achieve eventual consistency only. Anyway ... How do we ensure that we don't book more seats than available? This is especially a problem if there are many concurrent requests.

It may happen that n aggregates are populated with the same amount of reserved seats, and all of these aggregate instances allow reservations.

like image 261
user1016265 Avatar asked Nov 15 '16 11:11

user1016265


People also ask

How do you resolve a race condition?

To avoid race conditions, any operation on a shared resource – that is, on a resource that can be shared between threads – must be executed atomically. One way to achieve atomicity is by using critical sections — mutually exclusive parts of the program.

What is race condition in API?

“Race conditions” refers to bugs that occur due to the timing or order of execution of multiple operations. This is a fairly broad class of bugs that can present themselves in very different ways, depending on the problem space.

What is a race condition in DB?

A race condition occurs when two threads access a shared variable at the same time. The first thread reads the variable, and the second thread reads the same value from the variable.


2 Answers

I understand that in distributes event-based systems we are able to achieve eventual consistency only, anyway ... How to do not allow to book more seats than we have? Especially in terms of many concurrent requests?

All events are private to the command running them until the book of record acknowledges a successful write. So we don't share the events at all, and we don't report back to the caller, without knowing that our version of "what happened next" was accepted by the book of record.

The write of events is analogous to a compare-and-swap of the tail pointer in the aggregate history. If another command has changed the tail pointer while we were running, our swap fails, and we have to mitigate/retry/fail.

In practice, this is usually implemented by having the write command to the book of record include an expected position for the write. (Example: ES-ExpectedVersion in GES).

The book of record is expected to reject the write if the expected position is in the wrong place. Think of the position as a unique key in a table in a RDBMS, and you have the right idea.

This means, effectively, that the writes to the event stream are actually consistent -- the book of record only permits the write if the position you write to is correct, which means that the position hasn't changed since the copy of the history you loaded was written.

It's typical for commands to read event streams directly from the book of record, rather than the eventually consistent read models.

It may happen that n-AggregateRoots will be populated with the same amount of reserved seats, it means having validation in the reserve method won't help, though. Then n-AggregateRoots will emit the event of successful reservation.

Every bit of state needs to be supervised by a single aggregate root. You can have n different copies of that root running, all competing to write to the same history, but the compare and swap operation will only permit one winner, which ensures that "the" aggregate has a single internally consistent history.

like image 54
VoiceOfUnreason Avatar answered Sep 28 '22 01:09

VoiceOfUnreason


There are going to be a couple of ways to deal with such a scenario.

First off, an event stream would have the current version as the version of the last event added. This means that when you would not, or should not, be able to persist the event stream if the event stream is not at the version when loaded. Since the very first write would cause the version of the event stream to be increased, the second write would not be permitted. Since events are not emitted, per se, but rather a result of the event sourcing we would not have the type of race condition in your example.

Well, if your commands are processed behind a queue any failures should be retried. Should it not be possible to process the request you would enter the normal "I'm sorry, Dave. I'm afraid I can't do that" scenario by letting the user know that they should try something else.

Another option is to start the processing by issuing an update against some table row to serialize any calls to the aggregate. Probably not the most elegant but it does cause a system-wide block on the processing.

I guess, to a large extent, one cannot really trust the read store when it comes to transactional processing.

Hope that helps :)

like image 22
Eben Roux Avatar answered Sep 28 '22 01:09

Eben Roux