Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does EF retrieve IDs unnecessarily when doing a one-to-many join?

Suppose you have an Author domain class, and a Book domain class and each Author can have 0 or more Books (a one-to-many relationship).

If you do the following:

var dtos = dbContext.Authors.Select(a => new
{
    Name = a.Name,
    BookNames = a.Books.Select(b => b.Name).ToList() 
}).ToList();

You expect the following SQL to get generated since you only requested the names of the authors, and the names of the books of each author:

SELECT [a].[Name], [b].[Name]
FROM [Authors] AS [a]
LEFT JOIN [Books] AS [b] ON [a].[Id] = [b].[AuthorId]
ORDER BY [a].[Id], [b].[Id]

However, instead, Entity Framework causes the following SQL to be generated:

SELECT [a].[Name], [a].[Id], [b].[Name], [b].[Id]
FROM [Authors] AS [a]
LEFT JOIN [Books] AS [b] ON [a].[Id] = [b].[AuthorId]
ORDER BY [a].[Id], [b].[Id]

which apparently is unnecessarily retrieving the IDs of all authors and the IDs of all their books.

This is while this doesn't happen when you do the joining from the other side, that is, from books to authors. For example, when you request the names of the books, and the names of their authors, like:

var dtos = dbContext.Books.Select(b => new
{
    Name = b.Name,
    BookNames = b.Author == null ? null : b.Author.Name
}).ToList();

The generated SQL for the above code would be:

SELECT [b].[Name], CASE
    WHEN [a].[Id] IS NULL THEN CAST(1 AS bit)
    ELSE CAST(0 AS bit)
END, [a].[Name]
FROM [Books] AS [b]
LEFT JOIN [Authors] AS [a] ON [b].[AuthorId] = [a].[Id]

which is not retrieving the IDs, as expected.

Why is this the case? Why does EF Core retrieve the IDs even though I didn't include them in my projection?

I have tested this with EF Core 3.x + 5.0 RC1. If that matters.

like image 571
Arad Avatar asked Jan 25 '23 17:01

Arad


1 Answers

The SQL query result set is flat. The one you expect would have been generated (and indeed is generated when you doing it from other side) if your query was flat as well, like this

var query =
    from a in dbContext.Authors
    from b in a.Books.DefaultIfEmpty()
    select new { Name = a.Name, BookName = b.Name };

However your query result shape is different - name + correlated list of book names. In general SQL cannot provide such shape, so EF Core should somehow convert (group) the returned flat result set client side. The records come ordered by the two ids, but these two ids are also needed for correct grouping. Well, technically speaking, all ids except the last in order, so [b].[Id] in this case is redundant. So they can group them sequentially (without buffering the whole result set in memory like the regular LINQ to Objects GroupBy) with something like this (in pseudo-code):

var enumerator = query.GetEnumerator();
bool hasNext = enumerator.MoveNext();
while (hasNext)
{
    var aId = enumerator.Current["AId"]; // <-- here is where it is needed
    var aName = enumerator.Current["AName"];
    var bNames = new List<string>();
    do
    {
        bNames.Add(enumerator.Current["BName"]);
        hasNext = enumerator.MoveNext();
    }
    while (hasNext && aId == enumerator.Current["AId"]); // <-- as well here
    // ready to produce record for the consumer
    yield return new { Name = aName, BookNames = bNames };
}
like image 72
Ivan Stoev Avatar answered Feb 16 '23 03:02

Ivan Stoev