Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Entity Framework Core 3.0 performance impact for including collection navigation properties (cartesian explosion)

We're facing a major performance problem after upgrading EF Core 2.2 to EF Core 3.0. Imagine a simple data model with a single collection navigation property and hundreds of fields (the reality looks even darker):

public class Item
{
  [Key]
  public int ItemID {get;set;}

  public ICollection<AddInfo> AddInfos {get;set;}
  ...  // consisting of another 100+ properties!
}

and

public class AddInfo
{
  [Key]
  public int AddInfoID {get;set;}
  public int? ItemID {get;set;}
  public string SomePayload {get;set;}
}

During item retrieval, we're querying as followed:

...
var myQueryable = this._context.Items.Include(i => i.AddInfos).Where(**some filter**);
... // moar filters
var result = myQueryable.ToList();

Straight forward, up until this point.

In EF 2.2, fetching that queryable results in two separate queries, one for Item and one for the AddInfo - level. These queries usually fetch 10.000 items and around 250.000 AddInfos.

In EF Core 3.0 however, a single query is being generated, left-joining AddInfo to Item that on first glance appears to be the better option. Our Item however needs to be fetched with all 100+ fields, which is why projecting to a smaller class or anonymous type (adding a call to the .Select(...)-method) isn't feasible. Therefore, the result set has so much redundancy in it (each Item approx. 25 times) that the query itself takes too long to run in an acceptable time.

Does EF-Core 3.0 provide any option that would enable us to switch back to the query-behavior of the good old EF Core 2.2 times without extensive changes to our data model? We're already profiting from this change in other parts of the application, but not in this particular scenario.

Many thanks in advance!

Update

After further investigation I found that this issue is already adressed with Microsoft here and out of the box, there seems to be no way to configure the split query execution.

like image 318
TheSchmu Avatar asked Dec 03 '19 17:12

TheSchmu


People also ask

What is Cartesian explosion?

Cartesian Explosion problem is when you load more and more one-to-many relationships, the amount of duplicate data grows and large number of rows are returned. This will have a large impact on the performance of your application. As you can see from the example, this could get really large very quickly.

Is Entity Framework core fast?

Entity Framework (EF) Core, Microsoft's object-to-database mapper library for . NET Framework, brings performance improvements for data updates in version 7, Microsoft claims. The performance of SaveChanges method in EF7 is up to 74% faster than in EF6, in some scenarios. While EF is widely used in .

What is lazy loading EF core?

Lazy loading means that the related data is transparently loaded from the database when the navigation property is accessed.


1 Answers

As per the update of my initial question, the insights went so far as to assure myself that currenty, there is in fact no built in configuration to return to split query execution.

However, MS provided code samples on how to do this with minimal code changes (for our use case!) here.

We're simply removing the .Include(...) calls to collection navigation properties (1:n relations in our case, 1:1 relations are not affected!). After fetching the items, we're simply making another call using:

...
var myQueryable = this._context.Items.Where(**some filter**);
... // moar filters
var result = myQueryable.ToList();
...
var addInfos = myQueryable.Include(x => x.AddInfos).SelectMany(x => x.AddInfos).Select(x => new {x.ItemID, x}).ToList();

This fetches the collection navigation property entities and - if change tracking is enabled - automatically fills the collections on the individual items in the result variable.

like image 87
TheSchmu Avatar answered Sep 22 '22 20:09

TheSchmu