Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I create many DBConnections asynchronously?

I'm trying to improve the performance of a complex database read operation. I've found some code that, in limited testing, performs much faster than previous attempts using a variety of techniques, including a hand-tuned stored procedure. It's using Dapper, but Dapper isn't the primary source of concern.

public IEnumerable<Order> GetOpenOrders(Guid vendorId)
{
    var tasks = GetAllOrders(vendorId)
        .Where(order => !order.IsCancelled)
        .Select(async order => await GetLineItems(order))
        .Select(async order =>
        {
            var result = (await order);
            return result.GetBalance() > 0M ? result : null;
        })
        .Select(async order => await PopulateName(await order))
        .Select(async order => await PopulateAddress(await order))
        .ToList();
    Task.WaitAll(tasks.ToArray<Task>());
    return tasks.Select(t => t.Result);
}

private IDbConnection CreateConnection()
{
    return new SqlConnection("...");
}

private IEnumerable<Order> GetAllOrders(Guid vendorId)
{
    using (var db = CreateConnection())
    {
        return db.Query<Order>("...");
    }
}

private async Task<Order> GetLineItems(Order order)
{
    using (var db = CreateConnection())
    {
        var lineItems = await db.QueryAsync<LineItem>("...");
        order.LineItems = await Task.WhenAll(lineItems.Select(async li => await GetPayments(li)));
        return order;
    }
}

private async Task<LineItem> GetPayments(LineItem lineItem)
{
    using (var db = CreateConnection())
    {
        lineItem.Payments = await db.QueryAsync<Payment>("...");
        return lineItem;
    }
}

private async Task<Order> PopulateName(Order order)
{
    using (var db = CreateConnection())
    {
        order.Name = (await db.QueryAsync<string>("...")).FirstOrDefault();
        return order;
    }
}

private async Task<Order> PopulateAddress(Order order)
{
    using (var db = CreateConnection())
    {
        order.Address = (await db.QueryAsync<string>("...")).FirstOrDefault();
        return order;
    }
}

This is somewhat simplified, but I hope it highlights my primary issue:

  • Is this code a good idea?

I know that it's possible to make it safer by reusing the same connection, but creating many connections makes it faster by an order of magnitude in my testing. I've also tested/counted the number of concurrent connections from the database itself, and I'm seeing hundreds of statements running at the same time.

Some related questions:

  • Should I use more async (ex: CreateConnection(), GetAllOrders), or less?
  • What kind of testing can/should I do before I put this kind of code in production?
  • Are there alternative strategies that can produce similar performance but require fewer connections?
like image 632
ayers Avatar asked Apr 18 '15 01:04

ayers


1 Answers

The biggest problem with your code is that you are fetching way more data from your database than you actually need to satisfy the query. This is known as extraneous fetching.

Dapper is great, but unlike Entity Framework and other solutions, it is not a LINQ provider. You must express the entirety of your query in the SQL, including the WHERE clause. Dapper just helps you materialize it into objects. It returns IEnumerable<T>, not IQueryable<T>.

So your code:

GetAllOrders(vendorId)
    .Where(order => !order.IsCancelled)

Actually requests all orders in the database - not just uncancelled ones. The filter is happening in memory, afterwards.

Likewise:

order.Name = (await db.QueryAsync<string>("...")).FirstOrDefault();

The ... of your query better include a SELECT TOP 1, or you will actually get all items back, just to throw away all but the first item.

Also, consider that you are making many smaller calls to populate each segment of an order. With each order, you have 3 additional queries, with N additional rows. This is a common anti-pattern, known as SELECT N+1. It is always better to express the entirety of your query as a "chunky" operation than to emit many chatty queries to the database. This is also described as the chatty I/O anti-pattern.

With regards to the async questions - while there is nothing inherently wrong with making multiple database calls in parallel, that's not exactly what you're doing here. Since you're awaiting each step along the way, you're still doing things serially.

Well, at least you're doing them serially for each order. You are getting some parallelism in the outer loop. But all of the inner stuff is essentially serial. The Task.WaitAll will block until all of the outer tasks (one per order filtered) are complete.

Another problem is that you aren't in an async context when you call GetOpenOrders in the first place. The real benefits of async/await aren't realized until you have async all the way up and down the stack. I also suggest you watch this video series on async from Channel 9.

My recommendation is to:

  • Determine the full query you need to run to retrieve all the data from the database, but not more than you actually need.
  • Execute that query in Dapper. Use Query if you're in a synchronous context (IEnumerable<Order> GetOpenOrders), or use QueryAsync if you're in an asynchronous context (async Task<IEnumerable<Order>> GetOpenOrdersAsync). Don't try to use the async query from a non-async context.
  • Use Dapper's multi-mapping feature to retrieve multiple objects from a single query.
like image 63
Matt Johnson-Pint Avatar answered Oct 10 '22 14:10

Matt Johnson-Pint