I've been fooling around with some LINQ over Entities and I'm getting strange results and I would like to get an explanation...
Given the following LINQ query,
// Sample # 1
IEnumerable<GroupInformation> groupingInfo;
groupingInfo = from a in context.AccountingTransaction
group a by a.Type into grp
select new GroupInformation()
{
GroupName = grp.Key,
GroupCount = grp.Count()
};
I get the following SQL query (taken from SQL Profiler):
SELECT
1 AS [C1],
[GroupBy1].[K1] AS [Type],
[GroupBy1].[A1] AS [C2]
FROM ( SELECT
[Extent1].[Type] AS [K1],
COUNT(1) AS [A1]
FROM [dbo].[AccountingTransaction] AS [Extent1]
GROUP BY [Extent1].[Type]
) AS [GroupBy1]
So far so good.
If I change my LINQ query to:
// Sample # 2
groupingInfo = context.AccountingTransaction.
GroupBy(a => a.Type).
Select(grp => new GroupInformation()
{
GroupName = grp.Key,
GroupCount = grp.Count()
});
it yields to the exact same SQL query. Makes sense to me.
Here comes the interesting part... If I change my LINQ query to:
// Sample # 3
IEnumerable<AccountingTransaction> accounts;
IEnumerable<IGrouping<object, AccountingTransaction>> groups;
IEnumerable<GroupInformation> groupingInfo;
accounts = context.AccountingTransaction;
groups = accounts.GroupBy(a => a.Type);
groupingInfo = groups.Select(grp => new GroupInformation()
{
GroupName = grp.Key,
GroupCount = grp.Count()
});
the following SQL is executed (I stripped a few of the fields from the actual query, but all the fields from the table (~ 15 fields) were included in the query, twice):
SELECT
[Project2].[C1] AS [C1],
[Project2].[Type] AS [Type],
[Project2].[C2] AS [C2],
[Project2].[Id] AS [Id],
[Project2].[TimeStamp] AS [TimeStamp],
-- <snip>
FROM ( SELECT
[Distinct1].[Type] AS [Type],
1 AS [C1],
[Extent2].[Id] AS [Id],
[Extent2].[TimeStamp] AS [TimeStamp],
-- <snip>
CASE WHEN ([Extent2].[Id] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C2]
FROM (SELECT DISTINCT
[Extent1].[Type] AS [Type]
FROM [dbo].[AccountingTransaction] AS [Extent1] ) AS [Distinct1]
LEFT OUTER JOIN [dbo].[AccountingTransaction] AS [Extent2] ON [Distinct1].[Type] = [Extent2].[Type]
) AS [Project2]
ORDER BY [Project2].[Type] ASC, [Project2].[C2] ASC
Why are the SQLs generated are so different? After all, the exact same code is executed, it's just that sample # 3 is using intermediate variables to get the same job done!
Also, if I do:
Console.WriteLine(groupingInfo.ToString());
for sample # 1 and sample # 2, I get the exact same query that was captured by SQL Profiler, but for sample # 3, I get:
System.Linq.Enumerable+WhereSelectEnumerableIterator`2[System.Linq.IGrouping`2[System.Object,TestLinq.AccountingTransaction],TestLinq.GroupInformation]
What is the difference? Why can't I get the SQL Query generated by LINQ if I split the LINQ query in multiple instructions?
The ulitmate goal is to be able to add operators to the query (Where, OrderBy, etc.) at run-time.
BTW, I've seen this behavior in EF 4.0 and EF 6.0.
Thank you for your help.
The reason is because in your third attempt you're referring to accounts
as IEnumerable<AccountingTransaction>
which will cause the query to be invoked using Linq-To-Objects (Enumerable.GroupBy
and Enumerable.Select
)
On the other hand, in your first and second attempts the reference to AccountingTransaction
is preserved as IQueryable<AccountingTransaction>
and the query will be executed using Linq-To-Entities which will then transform it to the appropriate SQL statement.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With