I'm trying to improve the performance of a complex database read operation. I've found some code that, in limited testing, performs much faster than previous attempts using a variety of techniques, including a hand-tuned stored procedure. It's using Dapper, but Dapper isn't the primary source of concern.
public IEnumerable<Order> GetOpenOrders(Guid vendorId)
{
var tasks = GetAllOrders(vendorId)
.Where(order => !order.IsCancelled)
.Select(async order => await GetLineItems(order))
.Select(async order =>
{
var result = (await order);
return result.GetBalance() > 0M ? result : null;
})
.Select(async order => await PopulateName(await order))
.Select(async order => await PopulateAddress(await order))
.ToList();
Task.WaitAll(tasks.ToArray<Task>());
return tasks.Select(t => t.Result);
}
private IDbConnection CreateConnection()
{
return new SqlConnection("...");
}
private IEnumerable<Order> GetAllOrders(Guid vendorId)
{
using (var db = CreateConnection())
{
return db.Query<Order>("...");
}
}
private async Task<Order> GetLineItems(Order order)
{
using (var db = CreateConnection())
{
var lineItems = await db.QueryAsync<LineItem>("...");
order.LineItems = await Task.WhenAll(lineItems.Select(async li => await GetPayments(li)));
return order;
}
}
private async Task<LineItem> GetPayments(LineItem lineItem)
{
using (var db = CreateConnection())
{
lineItem.Payments = await db.QueryAsync<Payment>("...");
return lineItem;
}
}
private async Task<Order> PopulateName(Order order)
{
using (var db = CreateConnection())
{
order.Name = (await db.QueryAsync<string>("...")).FirstOrDefault();
return order;
}
}
private async Task<Order> PopulateAddress(Order order)
{
using (var db = CreateConnection())
{
order.Address = (await db.QueryAsync<string>("...")).FirstOrDefault();
return order;
}
}
This is somewhat simplified, but I hope it highlights my primary issue:
I know that it's possible to make it safer by reusing the same connection, but creating many connections makes it faster by an order of magnitude in my testing. I've also tested/counted the number of concurrent connections from the database itself, and I'm seeing hundreds of statements running at the same time.
Some related questions:
The biggest problem with your code is that you are fetching way more data from your database than you actually need to satisfy the query. This is known as extraneous fetching.
Dapper is great, but unlike Entity Framework and other solutions, it is not a LINQ provider. You must express the entirety of your query in the SQL, including the WHERE
clause. Dapper just helps you materialize it into objects. It returns IEnumerable<T>
, not IQueryable<T>
.
So your code:
GetAllOrders(vendorId)
.Where(order => !order.IsCancelled)
Actually requests all orders in the database - not just uncancelled ones. The filter is happening in memory, afterwards.
Likewise:
order.Name = (await db.QueryAsync<string>("...")).FirstOrDefault();
The ...
of your query better include a SELECT TOP 1
, or you will actually get all items back, just to throw away all but the first item.
Also, consider that you are making many smaller calls to populate each segment of an order. With each order, you have 3 additional queries, with N additional rows. This is a common anti-pattern, known as SELECT N+1. It is always better to express the entirety of your query as a "chunky" operation than to emit many chatty queries to the database. This is also described as the chatty I/O anti-pattern.
With regards to the async questions - while there is nothing inherently wrong with making multiple database calls in parallel, that's not exactly what you're doing here. Since you're awaiting each step along the way, you're still doing things serially.
Well, at least you're doing them serially for each order. You are getting some parallelism in the outer loop. But all of the inner stuff is essentially serial. The Task.WaitAll
will block until all of the outer tasks (one per order filtered) are complete.
Another problem is that you aren't in an async context when you call GetOpenOrders
in the first place. The real benefits of async/await aren't realized until you have async all the way up and down the stack. I also suggest you watch this video series on async from Channel 9.
My recommendation is to:
Query
if you're in a synchronous context (IEnumerable<Order> GetOpenOrders
), or use QueryAsync
if you're in an asynchronous context (async Task<IEnumerable<Order>> GetOpenOrdersAsync
). Don't try to use the async query from a non-async context.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With