I have been looking at Entity Framework performance, particularly around the use of Includes and the time taken to both generate and execute the various queries.
I am going to detail changes I have made, but please correct me if you think any of these assumptions are wrong.
Firstly we have around 10,000 items (not many) in a DB and the database is significantly normalized (which results in a significant number of navigation properties). Currently, the approach is to lazy load everything, and given that requesting one item can spool off tens of DB requests, the performance is quite poor, particularly for larger sets of data. (This is an inherited project and step one is trying to improve performance without significant restructuring)
So my first step was to take the results of a query and then apply the Includes for the navigation properties only to those results. I know this technically performs 2 queries, but if we have 10,000 items stored, but only want to return 10 items, it makes more sense to only include the navigation properties on those 10 items.
Secondly, where multiple includes are used on a query result and that result set size is quite large, it still suffered from poor performance. I have been pragmatic about when to eager load and when to leave the lazy loading in place. My next change was to load query includes in batches, so performing:
query.Include(q => q.MyInclude).Load();
This once again significantly improved performance, although a few more DB calls (one for each batch of includes) it was quicker than a large query or at the very least reduced the overhead of Entity Framework trying to produce that large query.
So the code now looks something like this:
var query = ctx.Filters.Where(x => x.SessionId == id) .Join(ctx.Items, i => i.ItemId, fs => fs.Id, (f, fs) => fs); query .Include(x => x.ItemNav1) .Include(x => x.ItemNav2).Load(); query .Include(x => x.ItemNav3) .Include(x => x.ItemNav4).Load(); query .Include(x => x.ItemNav5) .Include(x => x.ItemNav6).Load();
Now, this is reasonably performant, however, it would be nice to improve this further.
I had considered using LoadAsync()
, which after a bit more refactoring would be possible and would better fit with the rest of the architecture.
However, you can only execute one query at a time on a DB context. So I was wondering if there was any way to possibly create a new DB context, perform LoadAsync()
on each group of navigation properties (asynchronously) and then concatenate all of the results.
I know technically how you might create a new context, fire off a LoadAsync()
for each navigation group, but not how to concatenate the results, I don't know if it is definitely possible or whether it goes against good practice.
So my question is; is this possible or, is there another way I can further improve performance? I'm trying to stick with what Entity Framework provides rather than crafting some stored procs. Thanks
UPDATE
Regarding the performance disparity, I'm seeing between using all Includes in one statement and Loading these in small groups. When running a query that returns 6000 items. (Using SQL profiler and VS diagnostics to determine times)
Grouped Includes: In total takes ~8 seconds to execute the includes.
Includes in one statement: SQL query is taking ~30 seconds to load. (Often getting timeouts)
After a bit more investigation, I don't think there is much overhead when EF converts the SQL results to models. However we have seen nearly 500ms taken for EF to generate complex queries, which isn't ideal, but I'm not sure this can be resolved
UPDATE 2
With Ivan's help and following this https://msdn.microsoft.com/en-gb/data/hh949853.aspx we were able to improve things further, particularly using SelectMany
. I would highly recommend the MSDN article to anyone attempting to improve their EF performance.
At its heart, Entity Framework is a way of exposing . NET objects without actually knowing their values, but then fetching / updating those values from the database behind the scenes when you need them. It's important to be aware of when EF is going to hit the database – a process called materialization.
The AsNoTracking method tells Entity Framework to stop that additional work and so, it can improve the performance of your application. So, in theory, a query with AsNoTracking should perform better than without.
AsNoTracking is used to increase performance, but returning entities instead of view models can affect performance far more.
Your second approach relies on the EF navigation property fixup process. The problem is though that every
query.Include(q => q.ItemNavN).Load();
statement will also include all the master record data along with the related entity data.
Using the same basic idea, one potential improvement could be to execute one Load
per each navigation property, replacing the Include
with either Select
(for references) or SelectMany
(for collections) - something similar to how EF Core processes the Include
s internally.
Taking your second approach example, you could try the following and compare the performance:
var query = ctx.Filters.Where(x => x.SessionId == id) .Join(ctx.Items, i => i.ItemId, fs => fs.Id, (f, fs) => fs); query.Select(x => x.ItemNav1).Load(); query.Select(x => x.ItemNav2).Load(); query.Select(x => x.ItemNav3).Load(); query.Select(x => x.ItemNav4).Load(); query.Select(x => x.ItemNav5).Load(); query.Select(x => x.ItemNav6).Load(); var result = query.ToList(); // here all the navigation properties should be populated
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With