I'm currently working on a prototype in C# that utilises CQRS and event sourcing and I've hit a performance bottleneck in my projections to an SQL database. My first prototype was built with Entity Framework 6, code first. This choice was made primarily to get going and because the read side would benefit from LINQ. Every (applicable) event is consumed by multiple projections, which either create or update the corresponding entity. Such a projection currently look like this: <pre class="prettyprint"><code>public async Task HandleAsync(ItemPlacedIntoStock @event) { var bookingList = new BookingList(); bookingList.Date = @event.Date; bookingList.DeltaItemQuantity = @event.Quantity; bookingList.IncomingItemQuantity = @event.Quantity; bookingList.OutgoingItemQuantity = 0; bookingList.Item = @event.Item; bookingList.Location = @event.Location; bookingList.Warehouse = @event.Warehouse; using (var repository = new BookingListRepository()) { repository.Add(bookingList); await repository.Save(); } } </code></pre> This isn't very well performing, most likely for the reason that I call <code>DbContext.SaveChanges()</code> in the <code>IRepository.Save()</code> method. One for each event. What options should I explore next? I don't want to spent days chasing ideas that might prove to be only marginally better. I currently see the following options: <ul> <li>Stick with EF, but batch process the events (i.e. new/save context every X number of events) as long as the projection is running behind.</li> <li>Try to do more low-level SQL, for example with ADO.NET.</li> <li>Don't use SQL to store the projections (i.e. use NoSQL)</li> </ul> I expect to see millions of events because we plan to source a large legacy application and migrate data in the form of events. New projections will also be added often enough so the processing speed is an actual issue. Benchmarks: <ul> <li>The current solution (EF, save after every event) processes ~200 events per second (per projection). It does not scale directly with the number of active projections (i.e. N projections process less than N * 200 events/second).</li> <li>When the projections aren't saving the context, the number of events/second increases marginally (less than double)</li> <li>When the projections don't do anything (single return statement), the processing speed of my prototype pipeline is ~30.000 events/second globally</li> </ul> Updated benchmarks <ul> <li>Single-threaded inserts via ADO.NET <code>TableAdapter</code> (new <code>DataSet</code> and new <code>TableAdapter</code> on each iteration): ~2.500 inserts/second. Did not test with projection pipeline but standalone</li> <li>Single-threaded inserts via ADO.NET <code>TableAdapter</code> that does not <code>SELECT</code> after inserting: ~3.000 inserts/second <ul> <li>Single-threaded ADO.NET <code>TableAdapter</code> batch-insert of 10.000 rows (single dataset, 10.000 rows in-memory): >10.000 inserts/second (my sample size and window was too small)</li> </ul> </li> </ul>

I've seen performance improvements of several orders of magnitude, even with Entity Framework, when batching the commits and improving my overall projection engine. <ul> <li>Each projection is a separate subscription on the Event Store. This allows each projection to run at its maximum speed. Theoretical maximum of my pipeline on my machine was 40.000 events per second (possibly more, I ran out of events to sample with)</li> <li>Each projection maintains a queue of events and deserialises the json to POCOs. Multiple deserialisations per projection run in parallel. Also switched to json.net from data contract serialisation.</li> <li>Each projection supports the notion of a unit of work. The unit of work is committed after processing 1000 events or if the deserialisation-queue is empty (i.e. I am either at the head position or experienced a buffer underrun). This means that a projection commits more often if it is only a few events behind.</li> <li>Made use of async TPL processing with interleaving of fetching, queueing, processing, tracking and committing.</li> </ul> This was achieved by using the following technologies and tools: <ul> <li>The ordered, queued and parallel deserialisation into POCOs is done via a TPL DataFlow <code>TransformBlock</code> with a <code>BoundedCapacity</code> somewhere over 100. Maximum degree of parallelism was <code>Environment.ProcessorCount</code> (i.e. 4 or 8). I saw a massive increase in performance with a queue size of 100-200 vs. 10: from 200-300 events to 10.000 events per second. This most likely means that a buffer of 10 was causing too many underruns and thus committed the unit of work too often.</li> <li>Processing is dispatched asynchronously from a linked <code>ActionBlock</code> </li> <li>Each time an event is deserialised, I increment a counter for pending events</li> <li>Each time an event is processed, I increment a counter for processed events </li> <li>The unit of work is committed after 1000 processed events, or whenever the deserialisation buffer runs out (number of pending events = number of processed events). I reduce both counters by the number of processed events. I don't reset them to 0 because other threads might have increased the number of pending events.</li> </ul> The values of a batch size of 1000 events and queue size of 200 are the result of experimentation. This also shows further options for improvement by tweaking these values for each projection independently. A projection that adds a new row for every event slows down considerably when using a batch size of 10.000 - while other projections that merely update a few entities benefit from a larger batch size. The deserialisation queue size is also vital for good performance. So, TL;DR: Entity framework is fast enough to handle up to 10.000 modifications per second - on parallel threads, each. Utilise your unit of work and avoid committing every single change - especially in CQRS, where the projection is the only thread making any changes to the data. Properly interleave parallel tasks, don't just blindly <code>async</code> everything.

Improve performance of event sourcing projections to RDBMS (SQL) via .NET

Tags:

c#

.net

projection

rdbms

event-sourcing

I'm currently working on a prototype in C# that utilises CQRS and event sourcing and I've hit a performance bottleneck in my projections to an SQL database.

My first prototype was built with Entity Framework 6, code first. This choice was made primarily to get going and because the read side would benefit from LINQ.

Every (applicable) event is consumed by multiple projections, which either create or update the corresponding entity.

Such a projection currently look like this:

Click to copy

public async Task HandleAsync(ItemPlacedIntoStock @event)
{
    var bookingList = new BookingList();
    bookingList.Date = @event.Date;
    bookingList.DeltaItemQuantity = @event.Quantity;
    bookingList.IncomingItemQuantity = @event.Quantity;
    bookingList.OutgoingItemQuantity = 0;
    bookingList.Item = @event.Item;
    bookingList.Location = @event.Location;
    bookingList.Warehouse = @event.Warehouse;

    using (var repository = new BookingListRepository())
    {
        repository.Add(bookingList);
        await repository.Save();
    }
}

This isn't very well performing, most likely for the reason that I call DbContext.SaveChanges() in the IRepository.Save() method. One for each event.

What options should I explore next? I don't want to spent days chasing ideas that might prove to be only marginally better.

I currently see the following options:

Stick with EF, but batch process the events (i.e. new/save context every X number of events) as long as the projection is running behind.
Try to do more low-level SQL, for example with ADO.NET.
Don't use SQL to store the projections (i.e. use NoSQL)

I expect to see millions of events because we plan to source a large legacy application and migrate data in the form of events. New projections will also be added often enough so the processing speed is an actual issue.

Benchmarks:

The current solution (EF, save after every event) processes ~200 events per second (per projection). It does not scale directly with the number of active projections (i.e. N projections process less than N * 200 events/second).
When the projections aren't saving the context, the number of events/second increases marginally (less than double)
When the projections don't do anything (single return statement), the processing speed of my prototype pipeline is ~30.000 events/second globally

Updated benchmarks

Single-threaded inserts via ADO.NET TableAdapter (new DataSet and new TableAdapter on each iteration): ~2.500 inserts/second. Did not test with projection pipeline but standalone
Single-threaded inserts via ADO.NET TableAdapter that does not SELECT after inserting: ~3.000 inserts/second
- Single-threaded ADO.NET TableAdapter batch-insert of 10.000 rows (single dataset, 10.000 rows in-memory): >10.000 inserts/second (my sample size and window was too small)

358

asked May 24 '16 11:05

urbanhusky

1 Answers

I've seen performance improvements of several orders of magnitude, even with Entity Framework, when batching the commits and improving my overall projection engine.

Each projection is a separate subscription on the Event Store. This allows each projection to run at its maximum speed. Theoretical maximum of my pipeline on my machine was 40.000 events per second (possibly more, I ran out of events to sample with)
Each projection maintains a queue of events and deserialises the json to POCOs. Multiple deserialisations per projection run in parallel. Also switched to json.net from data contract serialisation.
Each projection supports the notion of a unit of work. The unit of work is committed after processing 1000 events or if the deserialisation-queue is empty (i.e. I am either at the head position or experienced a buffer underrun). This means that a projection commits more often if it is only a few events behind.
Made use of async TPL processing with interleaving of fetching, queueing, processing, tracking and committing.

This was achieved by using the following technologies and tools:

The ordered, queued and parallel deserialisation into POCOs is done via a TPL DataFlow TransformBlock with a BoundedCapacity somewhere over 100. Maximum degree of parallelism was Environment.ProcessorCount (i.e. 4 or 8). I saw a massive increase in performance with a queue size of 100-200 vs. 10: from 200-300 events to 10.000 events per second. This most likely means that a buffer of 10 was causing too many underruns and thus committed the unit of work too often.
Processing is dispatched asynchronously from a linked ActionBlock
Each time an event is deserialised, I increment a counter for pending events
Each time an event is processed, I increment a counter for processed events
The unit of work is committed after 1000 processed events, or whenever the deserialisation buffer runs out (number of pending events = number of processed events). I reduce both counters by the number of processed events. I don't reset them to 0 because other threads might have increased the number of pending events.

The values of a batch size of 1000 events and queue size of 200 are the result of experimentation. This also shows further options for improvement by tweaking these values for each projection independently. A projection that adds a new row for every event slows down considerably when using a batch size of 10.000 - while other projections that merely update a few entities benefit from a larger batch size.

The deserialisation queue size is also vital for good performance.

So, TL;DR:

Entity framework is fast enough to handle up to 10.000 modifications per second - on parallel threads, each. Utilise your unit of work and avoid committing every single change - especially in CQRS, where the projection is the only thread making any changes to the data. Properly interleave parallel tasks, don't just blindly async everything.

answered Sep 27 '22 17:09

urbanhusky

Related questions
                            
                                How do you compile a console application with VS Code (Windows platform)?
                            
                                Get UWP theme colors in C# code
                            
                                "Use of unassigned local variable" in if statement with TryParse with the use of dynamic
                            
                                Check if a string contains a specific string using array
                            
                                apply background color to specific cell in excel using openxml
                            
                                Filter EC2 Instance with "DescribeInstanceStatus" routine - AWS SDK
                            
                                Xamarin AlarmManager Android
                            
                                Embedded resource font in C# does not work correctly
                            
                                (C#) Compiling class at runtime and calling methods from original code
                            
                                UWP problems with multiple views
                            
                                Send an HTTP POST request with C#
                            
                                Is safe to not await inside a using block?
                            
                                Intercept Azure Function Host Shutdown: Flush Application Insights TelemetryClient
                            
                                Arranging dotnet core app for 3-tiers with data access layer
                            
                                Can not find the specified object with X509Certificate2
                            
                                RSACryptoServiceProvider - Decrypt - The parameter is incorrect
                            
                                Owin Stage Markers
                            
                                Is there a way to detect when a certain entity is added to the database in Entity Framework 6?
                            
                                machine.config processModel autoConfig="true or false" for Explicit values in .net 4.0
                            
                                Using Autofac registering everything that ends with Service

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Improve performance of event sourcing projections to RDBMS (SQL) via .NET

Tags:

c#

.net

projection

rdbms

event-sourcing

urbanhusky

People also ask

1 Answers

urbanhusky

Recent Activity

Donate For Us