Today I discovered that Entity Framework was adding an unnecessary sub query to the SQL it generates. I started digging my code trying to narrow down where it might come from. A (long) while later I pin-pointed what's causing it. But now I'm more confused than when I started, as I have no clue why it causes it.
Basically what I discovered is that on certain scenarios, simply converting a constant into a variable can alter the SQL that Entity Framework generates. I've shrunk everything to the bare minimum and packed it in a little console app:
using System;
using System.Data.Entity;
using System.Linq;
class Program
{
private static readonly BlogContext _db = new BlogContext();
static void Main(string[] args)
{
const string email = "[email protected]";
var comments = from c in _db.Comments
where c.Email == email
select c;
var result = (from p in _db.Posts
join c in comments on p.PostId equals c.PostId
orderby p.Title
select new { p.Title, c.Content });
Console.WriteLine(result);
}
}
public class BlogContext : DbContext
{
public DbSet<Post> Posts { get; set; }
public DbSet<Comment> Comments { get; set; }
}
public class Post
{
public int PostId { get; set; }
public string Title { get; set; }
}
public class Comment
{
public int CommentId { get; set; }
public int PostId { get; set; }
public string Email { get; set; }
public string Content { get; set; }
}
This shows the following output, which is perfect:
SELECT
[Extent1].[PostId] AS [PostId],
[Extent1].[Title] AS [Title],
[Extent2].[Content] AS [Content]
FROM [dbo].[Posts] AS [Extent1]
INNER JOIN [dbo].[Comments] AS [Extent2] ON [Extent1].[PostId] = [Extent2].[PostId]
WHERE N'[email protected]' = [Extent2].[Email]
ORDER BY [Extent1].[Title] ASC
Now if I make email
a variable:
/*const*/ string email = "[email protected]";
The output changes radically:
SELECT
[Project1].[PostId] AS [PostId],
[Project1].[Title] AS [Title],
[Project1].[Content] AS [Content]
FROM ( SELECT
[Extent1].[PostId] AS [PostId],
[Extent1].[Title] AS [Title],
[Extent2].[Content] AS [Content]
FROM [dbo].[Posts] AS [Extent1]
INNER JOIN [dbo].[Comments] AS [Extent2] ON [Extent1].[PostId] = [Extent2].[PostId]
WHERE [Extent2].[Email] = @p__linq__0
) AS [Project1]
ORDER BY [Project1].[Title] ASC
As a side note, LINQ to SQL does not seem to do this. I know it's probably okay to ignore this, as both commands return the same data. But I'm extremely curious as to why this happens. Up until today I always had the (perhaps false?) impression that it is always safe to turn a constant into a variable, providing that the value remains the same (which in this case does). So I have to ask...
Why a seemingly insignificant change causes such a big difference in the generated SQL?
Update:
Just to be clear, my question isn't about the value of email
being a hard-coded value in the first query and a variable in the second (which makes all the sense in the world). My question is about why the variable version results in the extra sub query.
Thanks!
The answer is fairly simple. Your LINQ query is expressed with expression trees. The difference with the const variable vs non const one are lies in ConstantExpression and ParameterExpression.
When you use const your LINQ query uses ConstExpression
for this variable, and when you use non const it uses ParameterExpression
which are interpreted differently by the EF Runtime.
Constant actually means that the value will never change and the value can be inlined into the query.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With