I have been debugging some slow code and it seems that the culprit is the EF code posted below. It takes 4-5 seconds when the query is evaluated at a later stage. I'm trying to get it to run in under 1 second.
I have tested this using the SQL Server Profiler, and it seems that a bunch of SQL scripts are executed. It also confirms that it takes 3-4 seconds before SQL server is done with the executions.
I have read other similar questions about the use of Include() and it does seem that there is a performance penalty when using it. I've tried to split the below code into several different queries but it's not making much of difference.
Any idea how I can get the below to execute faster?
Currently the web app I'm working on is just showing an empty iframe while waiting for the below to complete. If I cannot get faster execution time I have to split it up and partially load the iframe with data or go with another asynchronous solution. Any ideas here would also be appreciated!
using (var scope = new TransactionScope(TransactionScopeOption.Required, new TransactionOptions { IsolationLevel = System.Transactions.IsolationLevel.ReadUncommitted }))
{
formInstance = context.FormInstanceSet
.Includes(x => x.Include(fi => fi.FormDefinition).Include(fd => fd.FormSectionDefinitions).Include(fs => fs.FormStateDefinitionEditableSections))
.Includes(x => x.Include(fi => fi.FormDefinition).Include(fd => fd.FormStateDefinitions))
.Includes(x => x.Include(fi => fi.FormSectionInstances).Include(fs => fs.FormFieldInstances).Include(ff => ff.FormFieldDefinition).Include(ffd => ffd.FormFieldMetaDataDefinition).Include(ffmdd => ffmdd.ComplexTypePropertyNames))
.Include(x => x.CurrentFormStateInstance)
.Include(x => x.Files)
.FirstOrDefault(x => x.FormInstanceIdentifier == formInstanceIdentifier);
scope.Complete();
}
Entity Framework loads very slowly the first time because the first query EF compiles the model. If you are using EF 6.2, you can use a Model Cache which loads a prebuilt edmx when using code first; instead, EF generates it on startup.
EF Core 6.0 itself is 31% faster executing queries. Heap allocations have been reduced by 43%.
Entity framework is ORM Model, which used LINQ to access database, and code is autogenerated whereas Ado.net code is larger than Entity Framework. Ado.net is faster than Entity Framework.
tl;dr Multiple Include
s blow up the SQL result set. Soon it becomes cheaper to load data by multiple database calls instead of running one mega statement. Try to find the best mixture of Include
and Load
statements.
it does seem that there is a performance penalty when using Include
That's an understatement! Multiple Include
s quickly blow up the SQL query result both in width and in length. Why is that?
Include
s(This part applies Entity Framework classic, v6 and earlier)
Let's say we have
Root
Root.Parent
Root.Children1
and Root.Children2
Root.Include("Parent").Include("Children1").Include("Children2")
This builds a SQL statement that has the following structure:
SELECT *, <PseudoColumns>
FROM Root
JOIN Parent
JOIN Children1
UNION
SELECT *, <PseudoColumns>
FROM Root
JOIN Parent
JOIN Children2
These <PseudoColumns>
consist of expressions like CAST(NULL AS int) AS [C2],
and they serve to have the same amount of columns in all UNION
-ed queries. The first part adds pseudo columns for Child2
, the second part adds pseudo columns for Child1
.
This is what it means for the size of the SQL result set:
SELECT
clause is the sum of all columns in the four tablesSince the total number of data points is columns * rows
, each additional Include
exponentially increases the total number of data points in the result set. Let me demonstrate that by taking Root
again, now with an additional Children3
collection. If all tables have 5 columns and 100 rows, we get:
One Include
(Root
+ 1 child collection): 10 columns * 100 rows = 1000 data points.
Two Include
s (Root
+ 2 child collections): 15 columns * 200 rows = 3000 data points.
Three Include
s (Root
+ 3 child collections): 20 columns * 300 rows = 6000 data points.
With 12 Includes
this would amount to 78000 data points!
Conversely, if you get all records for each table separately instead of 12 Includes
, you have 13 * 5 * 100
data points: 6500, less than 10%!
Now these numbers are somewhat exaggerated in that many of these data points will be null
, so they don't contribute much to the actual size of the result set that is sent to the client. But the query size and the task for the query optimizer certainly get affected negatively by increasing numbers of Include
s.
So using Includes
is a delicate balance between the cost of database calls and data volume. It's hard to give a rule of the thumb, but by now you can imagine that the data volume generally quickly outgrows the cost of extra calls if there are more than ~3 Includes
for child collections (but quite a bit more for parent Includes
, that only widen the result set).
The alternative to Include
is to load data in separate queries:
context.Configuration.LazyLoadingEnabled = false;
var rootId = 1;
context.Children1.Where(c => c.RootId == rootId).Load();
context.Children2.Where(c => c.RootId == rootId).Load();
return context.Roots.Find(rootId);
This loads all required data into the context's cache. During this process, EF executes relationship fixup by which it auto-populates navigation properties (Root.Children
etc.) by loaded entities. The end result is identical to the statement with Include
s, except for one important difference: the child collections are not marked as loaded in the entity state manager, so EF will try to trigger lazy loading if you access them. That's why it's important to turn off lazy loading.
In reality, you will have to figure out which combination of Include
and Load
statements work best for you.
Each Include
also increases query complexity, so the database's query optimizer will have to make increasingly more effort to find the best query plan. At some point this may no longer succeed. Also, when some vital indexes are missing (esp. on foreign keys) performance may suffer by adding Include
s, even with the best query plan.
For some reason, the behavior described above, UNIONed queries, was abandoned as of EF core 3. It now builds one query with joins. When the query is "star" shaped1 this leads to Cartesian explosion (in the SQL result set). I can only find a note announcing this breaking change, but it doesn't say why.
To counter this Cartesian explosion, Entity Framework core 5 introduced the concept of split queries that enables loading related data in multiple queries. It prevents building one massive, multiplied SQL result set. Also, because of lower query complexity, it may reduce the time it takes to fetch data even with multiple roundtrips. However, it may lead to inconsistent data when concurrent updates occur.
1Multiple 1:n relationships off of the query root.
I've got a similar issue with a query that had 15+ "Include" statements and generated a 2M+ rows result in 7 minutes.
The solution that worked for me was:
A sample can be found below:
public IQueryable<CustomObject> PerformQuery(int id)
{
ctx.Configuration.LazyLoadingEnabled = false;
ctx.Configuration.AutoDetectChangesEnabled = false;
IQueryable<CustomObject> customObjectQueryable = ctx.CustomObjects.Where(x => x.Id == id);
var selectQuery = customObjectQueryable.Select(x => x.YourObject)
.Include(c => c.YourFirstCollection)
.Include(c => c.YourFirstCollection.OtherCollection)
.Include(c => c.YourSecondCollection);
var otherObjects = customObjectQueryable.SelectMany(x => x.OtherObjects);
selectQuery.FirstOrDefault();
otherObjects.ToList();
return customObjectQueryable;
}
IQueryable is needed in order to do all the filtering at the server side. IEnumerable would perform the filtering in memory and this is a very time consuming process. Entity Framework will fix up any associations in memory.
with dot net core 5 I have use this solution
_context.ChangeTracker.LazyLoadingEnabled = false;
_context.ChangeTracker.AutoDetectChangesEnabled = false;
var mainObj = _context.MarinzonServiceItems.Where(filter);
var returnQuery = mainObj.Include(x => x.Service);
returnQuery.Include(x => x.User).Load();
returnQuery.Include(x => x.Category).Load();
returnQuery.Include(x => x.FAQQuestions).Load();
returnQuery.Include(x => x.FAQServices).Load();
returnQuery.Include(x => x.ServiceItemServices.Where(x => x.IsActive == true)).ThenInclude(x => x.ServiceItemServicePrices).Load();
return returnQuery;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With