Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Entity Framework appears to be needlessly joining the same table twice

Update This may already be fixed: http://entityframework.codeplex.com/workitem/486

...

A fairly straightforward LINQ statement against my entities is resulting in unnecessarily complex SQL. More on that later, here's the setup:

Tables

Publication

  • PublicationId (pk)
  • TopicId (fk to a Topic table)
  • ReceiptCount (denormalized for query performance)
  • DateInserted

Receipt

  • ReceiptId (pk)
  • PublicationId (fk to the table above)
  • DateInserted

LINQ

var query = from r in context.Receipts.Include("Publication")
            where r.DateInserted < lagDate
            && r.ReceiptId > request.AfterReceiptId
            && r.Publication.TopicId == topicEntity.TopicId
            && r.Publication.ReceiptCount > 1
            select r;

SQL

exec sp_executesql N'SELECT TOP (25) 
[Project1].[ReceiptId] AS [ReceiptId], 
[Project1].[PublicationId] AS [PublicationId], 
[Project1].[DateInserted] AS [DateInserted], 
[Project1].[DateReceived] AS [DateReceived], 
[Project1].[PublicationId1] AS [PublicationId1], 
[Project1].[PayloadId] AS [PayloadId], 
[Project1].[TopicId] AS [TopicId], 
[Project1].[BrokerType] AS [BrokerType], 
[Project1].[DateInserted1] AS [DateInserted1], 
[Project1].[DateProcessed] AS [DateProcessed], 
[Project1].[DateUpdated] AS [DateUpdated], 
[Project1].[PublicationGuid] AS [PublicationGuid], 
[Project1].[ReceiptCount] AS [ReceiptCount]
FROM ( SELECT 
    [Extent1].[ReceiptId] AS [ReceiptId], 
    [Extent1].[PublicationId] AS [PublicationId], 
    [Extent1].[DateInserted] AS [DateInserted], 
    [Extent1].[DateReceived] AS [DateReceived], 
    [Extent3].[PublicationId] AS [PublicationId1], 
    [Extent3].[PayloadId] AS [PayloadId], 
    [Extent3].[TopicId] AS [TopicId], 
    [Extent3].[BrokerType] AS [BrokerType], 
    [Extent3].[DateInserted] AS [DateInserted1], 
    [Extent3].[DateProcessed] AS [DateProcessed], 
    [Extent3].[DateUpdated] AS [DateUpdated], 
    [Extent3].[PublicationGuid] AS [PublicationGuid], 
    [Extent3].[ReceiptCount] AS [ReceiptCount]
    FROM   [dbo].[Receipt] AS [Extent1]
    INNER JOIN [dbo].[Publication] AS [Extent2] ON [Extent1].[PublicationId] = [Extent2].[PublicationId]
    LEFT OUTER JOIN [dbo].[Publication] AS [Extent3] ON [Extent1].[PublicationId] = [Extent3].[PublicationId]
    WHERE ([Extent2].[ReceiptCount] > 1) AND ([Extent1].[DateInserted] < @p__linq__0) AND ([Extent1].[ReceiptId] > @p__linq__1) AND ([Extent2].[TopicId] = @p__linq__2)
)  AS [Project1]
ORDER BY [Project1].[ReceiptId] ASC',N'@p__linq__0 datetime,@p__linq__1 int,@p__linq__2 int',@p__linq__0='2012-09-05 19:39:21:510',@p__linq__1=4458824,@p__linq__2=90

Problem

Publication gets joined twice:

  1. LEFT OUTER JOIN because of .Include("Publication")
  2. INNER JOIN because of the where.

If I remove [Extent2] from the SQL entirely, and change the WHERE bits to use [Extent3], I get the same results back. Since I'm not using Lazy Loading on my entities, I have to .Include("Publication")... is there any solution for this?

I was using EF4, but grabbed EF5 from NuGet to see if it was perhaps fixed, but it produces the same result (although I have no idea how to tell if my EDMX is really using EF5).

like image 755
Langdon Avatar asked Sep 05 '12 20:09

Langdon


People also ask

Can we join same table twice?

As you may know, it is used to join and combine data from two or more tables into one common data set. In this article, I'm going to discuss special types of joins? in which you combine the same table twice—including joining a table to itself, also known as the self join.

How do I query the same table twice?

You use a single table twice in a query by giving it two names, like that. The aliases are often introduced with the keyword AS. You also normally specify a join condition (for without it, you get the Cartesian Product of the table joined with itself). For preference you use the explicit JOIN notation.

Can we use join on same table?

You use self-join to create a result set that joins the rows with the other rows within the same table. Because you cannot refer to the same table more than one in a query, you need to use a table alias to assign the table a different name when you use self-join.

How do I join multiple tables in Entity Framework?

The LINQ join operator allows us to join multiple tables on one or more columns (multiple columns). By default, they perform the inner join of the tables. We also learn how to perform left joins in Entity Framework by using the join operator & DefaultIfEmpty method.


2 Answers

There is however, a work-around. It may not be the most elegant solution, but it does exactly what you want; it generates only one join.

Change:

var query = from r in context.Receipts.Include("Publication")    
            where r.DateInserted < lagDate 
            && r.ReceiptId > request.AfterReceiptId 
            && r.Publication.TopicId == topicEntity.TopicId 
            && r.Publication.ReceiptCount > 1 
            select r; 

To be:

var query = from r in context.Receipts
            join pub in context.Publication on r.PublicationId equals pub.PublicationId
            where r.DateInserted < lagDate 
            && r.ReceiptId > request.AfterReceiptId 
            && pub.TopicId == topicEntity.TopicId 
            && pub.ReceiptCount > 1 
            select new {
                Receipt = r,
                Publication = pub
            }; 

Note that we have removed the Include AND we are no longer using r.Publication.?? in the where clause. Instead we are using pub.??

Now when you loop through query, you will see that r.Publication is not null:

foreach ( var item in query)
{
    //see that item.Publication is not null
    if(item.Receipt != null && item.Receipt.Publication != null)
    {
        //do work based on a valid Publication
    }
    else
    {
        //do work based on no linked Publication
    }
}
like image 122
saml Avatar answered Oct 17 '22 10:10

saml


This behavior can be avoided by using temporary variables (eg, let pub = r.Publication).

var query = from r in context.Receipts
            let pub = r.Publication // using a temp variable
            where r.DateInserted < lagDate
            && r.ReceiptId > request.AfterReceiptId
            && pub.TopicId == topicEntity.TopicId
            && pub.ReceiptCount > 1
            select new { r, pub };
like image 30
VahidN Avatar answered Oct 17 '22 12:10

VahidN