How to make EF eager load a collection navigation property through a GroupJoin?

Question

I am trying to GroupJoin some data with an IQueryable and project that data into an anonymous type. The original entity that I am GroupJoining onto has an ICollection navigation property (ie. one:many). I want to eager load that property so I can access it after the group join without EF going back to the DB. I know that Include() doesn't work when you use a GroupJoin, but the following code is the only way I have found to make it eager load the collection (ContactRoomRoles):

using (var context = new MyDbContext()) {
    var foundRooms = context.Rooms.Include(rm => rm.ContactRoomRoles);
    foundRooms.ToList();  // <-- Required to make EF actually load ContactRoomRoles data!

    var roomsData = foundRooms
        .GroupJoin(
            context.Contacts,
            rm => rm.CreatedBy,
            cont => cont.Id,
            (rm, createdBy) => new {
                ContactRoomRoles = rm.ContactRoomRoles,
                Room = rm,
                CreatedBy = createdBy.FirstOrDefault()
            }
        )
        .ToList();

    var numberOfRoles1 = roomsData.ElementAt(1).Room.ContactRoomRoles.Count();
    var numberOfRoles2 = roomsData.ElementAt(2).Room.ContactRoomRoles.Count();
    var numberOfRoles3 = roomsData.ElementAt(3).Room.ContactRoomRoles.Count();
}

If I remove the foundRooms.ToList(), EF goes off to the database 3 times to populate my numberOfRoles variables at the end, but with foundRooms.ToList() it doesn't - it just eager loads the data in one query upfront.

Although this works, it feels like a total hack. I'm just calling .ToList() for the side-effect of making EF actually load the collection data. If I comment that line out, it goes to the database any time I try to access ContactRoomRoles. Is there a less hacky way to make EF eager load that navigation property?

NOTE: I want to use the navigation property rather than projecting it into a new property of the anonymous type because AutoMapper wants to access Room.ContactRoomRoles when it's mapping onto a DTO object.

Roman Pokrovskij · Accepted Answer

This is not a hack. This is an abstraction leak. We should be ready to meet abstraction leaks using ORM tools (and any other internal DSL).

After ToList() you not only execute actual sql call (and load data into memory) but also cross to other Linq flavor - "Linq for objects". After this all your calls of Count() doesn't generate sql just because you start working with in memory collections (not with expression trees those are hidden by IQueryable - the return type of GroupBy statement, but with List collection - return type of ToList).

Without ToList() you stay with "Linq for sql" and EF will translate each call of Count() on IQuerybale to sql; Three Conut() call = three underlined Sql statements.

There are no way to avoid this, otherwise then to calculate all count(*) values on server side in one complex query. If you will try to write such query with Linq (constructing expression tree) - you will meet abstraction leak again. ORM tool is designed to map objects to "RDBS entities" staying with CRUD (Create Read Update Delete) operations - if statement become more complex - you will be not able to foresee generated sql (and all runtime exceptions like 'can't generate sql for such linq'). So do not use linq for complex 'report like' queries (in some cases you could - it depends on your re-usage requirements and testing possibilities). Use old good SQL and call it through ADO or EF ADO "sql extensions" like EF Core FromSql:

var blogs = context.Blogs
    .FromSql("EXECUTE dbo.GetMostPopularBlogsForUser {0}", user)
    .ToList();

Update: it is a good recommendation also to avoid using lazy loading and manual entities loading if you are not working on reusable EF tools. They are in some sense opposite to linq queries - expression trees. They were important (if not only one) option to achieve referenced entities loading on "old" platforms where were no "expression trees" in language but in .NET/EF where full queries can be written "declarative way" as expression trees without execution (but with postponed interpretation) there should be very strong reason to return back to "manual" loading.

Gert Arnold · Answer

It's all about collections that are marked as loaded, or not.

The line

foundRooms.ToList();

(or foundRooms.Load())

loads all Rooms and their ContactRoomRoles collections into the context. Since the Include statement is used, these collections are marked as loaded by EF. You can check that by looking at

context.Entry(Rooms.Local.First()).Collection(r => r.ContactRoomRoles).IsLoaded

which should return true.

If you omit the line foundRooms.ToList();, each time a Room.ContactRoomRoles collection is accessed, EF will notice it's not marked as loaded yet and will lazy-load it. After that, the collection is marked as loaded, but it took an extra query.

A collection is only marked as loaded when it is -

Include-ed
loaded by lazy loading

loaded by the Load() statement, as in

context.Entry(Rooms.Local.First()).Collection(r => r.ContactRoomRoles).Load();

Not when it is part of a projection into another property (like the part ContactRoomRoles = rm.ContactRoomRole in your query).

However, after the statement var roomsData = foundRooms (...).ToList() all Room.ContactRoomRoles are populated, because the query did load them into the context, and EF's always executes the relationship fixup process, which auto-populates navigation properties.

So, to summarize, after your query you have roomsData containing room objects with ContactRoomRoles collections that are populated but not marked as loaded.

Knowing this, it's apparent now that the only thing to do is: prevent lazy loading to occur.

The best way to achieve that is to prevent EF from creating entity objects that are capable of lazy loading, aka proxies. You do that by adding the line

context.Configuration.ProxyCreationEnabled = false;

just below the using statement.

Now you'll notice that the line

var numberOfRoles1 = roomsData.ElementAt(1).Room.ContactRoomRoles.Count();

doesn't trigger an extra query, but does return the correct count.

Barr J · Answer

This is called an Abstraction Leak and it means your abstraction exposes some implementation details.

This is happening when you call the .ToList() and you switch (I don't like the word cross) between Linq to sql and Linq to objects.

I'd recommend you to read The Law of Leaky Abstractions to get the grasp better, as it is quite complicated to explain on one foot.

The main idea behind it is, that everything will work as planned but slower then usual when you attempt to provide a complete abstraction of an underlying unreliable layer, but sometimes, the layer leaks through the abstraction and you feel the things that the abstraction can’t quite protect you from.

Edit to clarify:

calling ToList() forces linq-to-entities to evaluate and return the results as a list.

Meaning that, for example from the answer above:

var blogs = context.Blogs
    .FromSql("EXECUTE dbo.GetMostPopularBlogsForUser {0}", user)
    .ToList();

Will be evaluation to the corresponding model of the context - blogs model.

So in other words, it is being lazily executed at the moment that you call ToList().

Prior to the ToList() call, C# does NO SQL calls. So actually, it is NOT an in-memory operation.

So yes, it is putting that data into memory as part of the context and reads it in the same context.

How to make EF eager load a collection navigation property through a GroupJoin?

Tags:

c#

database

sql-server

entity-framework

Jez

3 Answers

Roman Pokrovskij

Gert Arnold

Barr J

Recent Activity

Donate For Us

How to make EF eager load a collection navigation property through a GroupJoin?

Tags:

c#

database

sql-server

entity-framework

Jez

3 Answers

Roman Pokrovskij

Gert Arnold

Barr J

Related questions

Recent Activity

Donate For Us