Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should I materialize my LINQ query for database performance reasons?

Tags:

linq

I have the following code:

var result = Database.Set<Product>()
    .Where(x => x.Product.CreatedAt >= fromDate
             && x.Product.CreatedAt <= toDate);

var group1 = result
    .GroupBy(x => new { Id = x.Id, Name = x.Name })
    .Select(x => new { Id = x.Key.Id, Name = x.Key.Name });
var group2 = result
    .GroupBy(x => new { Id = x.Id, Price = x.Price })
    .Select(x => new { Id = x.Key.Id, Name = x.Key.Price });
var group3 = result
    .GroupBy(x => new { Id = x.Id, Category = x.Category })
    .Select(x => new { Id = x.Key.Id, Name = x.Key.Category });

Please don't pay attention to the GroupBy conditions. Let's assume I need the data in the three groups for some further processing down the line.

I am assuming that the code above will probably fire at least three SQL queries to produce the results. Would it be incorrect doing this instead?

var result = Database.Set<Product>()
    .Where(x => x.Product.CreatedAt >= fromDate
             && x.Product.CreatedAt <= toDate)
    .ToList();

At this point I assume there will only be one SQL call to grab the result set and place it in memory. I further assume that the three GroupBy operations will be performed on the in memory collection as opposed to firing more SQL queries.

Is my reasoning correct? Is there any upside/downside to this approach?

like image 662
Thomas Avatar asked Nov 14 '22 14:11

Thomas


1 Answers

Your reasoning is correct, calling ToList() will pull the data into local memory with only one SQL query, and then the subsequent group and select operations will be performed in local memory only.

Since your subsequent Linq queries are just reordering the data, there isn't a lot to be gained by passing that off to the SQL server and downloading the data multiple times in different orders. The main advantage would be reducing your memory footprint on the client. If the data is too big to fit on the local machine, then you have to do the grouping on the SQL server and pull it down to the client piecewise.

If the subsequent Linq queries were further filtering the data instead of just reordering, then the decision to use .ToList on the first query is less clear-cut. The first query could pull down a lot more data than you need, which could easily be more costly than making three queries that pull down only a little data each.

Another factor in favor of pulling the data in one query and reordering in local memory is data coherency between the three final result sets. If you run 3 SQL queries, you may get different results in each query due to updates happening concurrently on the server. By pulling the data down once, you are snapshotting the data, isolating it from concurrent updates, and that guarantees that the three groupings contain exactly the same data just in different order.

like image 159
dthorpe Avatar answered Jun 13 '23 06:06

dthorpe