I'm currently working on a prototype in C# that utilises CQRS and event sourcing and I've hit a performance bottleneck in my projections to an SQL database.
My first prototype was built with Entity Framework 6, code first. This choice was made primarily to get going and because the read side would benefit from LINQ.
Every (applicable) event is consumed by multiple projections, which either create or update the corresponding entity.
Such a projection currently look like this:
public async Task HandleAsync(ItemPlacedIntoStock @event)
{
var bookingList = new BookingList();
bookingList.Date = @event.Date;
bookingList.DeltaItemQuantity = @event.Quantity;
bookingList.IncomingItemQuantity = @event.Quantity;
bookingList.OutgoingItemQuantity = 0;
bookingList.Item = @event.Item;
bookingList.Location = @event.Location;
bookingList.Warehouse = @event.Warehouse;
using (var repository = new BookingListRepository())
{
repository.Add(bookingList);
await repository.Save();
}
}
This isn't very well performing, most likely for the reason that I call DbContext.SaveChanges()
in the IRepository.Save()
method. One for each event.
What options should I explore next? I don't want to spent days chasing ideas that might prove to be only marginally better.
I currently see the following options:
I expect to see millions of events because we plan to source a large legacy application and migrate data in the form of events. New projections will also be added often enough so the processing speed is an actual issue.
Benchmarks:
Updated benchmarks
TableAdapter
(new DataSet
and new TableAdapter
on each iteration): ~2.500 inserts/second. Did not test with projection pipeline but standaloneTableAdapter
that does not SELECT
after inserting: ~3.000 inserts/second
TableAdapter
batch-insert of 10.000 rows (single dataset, 10.000 rows in-memory): >10.000 inserts/second (my sample size and window was too small)The core features such as guaranteed writes, concurrency model, granular stream and stream APIs make EventStoreDB the best choice for event-sourced systems - especially when compared with other database solutions originally built for other purposes, And on top of that, it's open source.
Event Sourcing is a design pattern in which results of business operations are stored as a series of events. It is an alternative way to persist data. In contrast with state-oriented persistence that only keeps the latest version of the entity state, Event Sourcing stores each state change as a separate event.
Event sourcing has several benefits: It solves one of the key problems in implementing an event-driven architecture and makes it possible to reliably publish events whenever state changes. Because it persists events rather than domain objects, it mostly avoids the object‑relational impedance mismatch problem.
Event Sourcing is about using events as the state. Event Driven Architecture is about using events to communicate between service boundaries.
I've seen performance improvements of several orders of magnitude, even with Entity Framework, when batching the commits and improving my overall projection engine.
This was achieved by using the following technologies and tools:
TransformBlock
with a BoundedCapacity
somewhere over 100. Maximum degree of parallelism was Environment.ProcessorCount
(i.e. 4 or 8). I saw a massive increase in performance with a queue size of 100-200 vs. 10: from 200-300 events to 10.000 events per second. This most likely means that a buffer of 10 was causing too many underruns and thus committed the unit of work too often.ActionBlock
The values of a batch size of 1000 events and queue size of 200 are the result of experimentation. This also shows further options for improvement by tweaking these values for each projection independently. A projection that adds a new row for every event slows down considerably when using a batch size of 10.000 - while other projections that merely update a few entities benefit from a larger batch size.
The deserialisation queue size is also vital for good performance.
So, TL;DR:
Entity framework is fast enough to handle up to 10.000 modifications per second - on parallel threads, each.
Utilise your unit of work and avoid committing every single change - especially in CQRS, where the projection is the only thread making any changes to the data.
Properly interleave parallel tasks, don't just blindly async
everything.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With