Given, for example, the following data set:
new Entity { Id = 1, Group = 1, Value = "ABC", ... },
new Entity { Id = 2, Group = 1, Value = "DEF", ... },
new Entity { Id = 3, Group = 1, Value = "FGH", ... },
new Entity { Id = 4, Group = 1, Value = "LOP", ... },
new Entity { Id = 5, Group = 2, Value = "ALO", ... },
new Entity { Id = 6, Group = 2, Value = "PEO", ... },
new Entity { Id = 7, Group = 2, Value = "AHB", ... },
new Entity { Id = 8, Group = 2, Value = "DHB", ... },
new Entity { Id = 9, Group = 2, Value = "QPA", ... },
new Entity { Id = 10, Group = 2, Value = "LAN", ... },
// ... millions more records
how can I make a query, which is efficient (avoids the N+1 query problem) and gives me the top 3 records for each Group
ordered by Value
?
new Entity { Id = 1, Group = 1, Value = "ABC", ... },
new Entity { Id = 2, Group = 1, Value = "DEF", ... },
new Entity { Id = 3, Group = 1, Value = "FGH", ... },
new Entity { Id = 5, Group = 2, Value = "ALO", ... },
new Entity { Id = 7, Group = 2, Value = "AHB", ... },
new Entity { Id = 8, Group = 2, Value = "DHB", ... },
// ...
Most of the LINQ or Entity Framework solutions on Stack Overflow use GroupBy
with Take
which is evaluated client side (which means that all records are imported in memory and then the grouping happens outside of the database).
I've tried with:
var list = await _dbContext.Entities
.Select(x => new
{
OrderKey = _dbContext.Entities.Count(y =>
x.Group == y.Group
&& y.Value < x.Value),
Value = x,
})
.Where(x => x.OrderKey < 3)
.OrderBy(x => x.OrderKey)
.Select(x => x.Value)
.ToListAsync(cancellationToken);
but I'm pretty sure that's as inefficient as it gets.
How can I extract this logic into an extension method for IQueryable<T>
which returns IQueryable<T>
?
Interesting question. The main problem I see is that there is no standard SQL construct for performing such operation - most of the databases provide their own operators for working with rowset "window", like SqlServer's SELECT - OVER etc. There is also no "standard" LINQ operator / pattern for that.
Given
IQueryable<Entity> source
the typical way of performing such operation in LINQ is
var query = source.GroupBy(e => e.Group)
.SelectMany(g => g.OrderBy(e => e.Value).Take(3));
which EF6 translates to the following SQL
SELECT
[Limit1].[Id] AS [Id],
[Limit1].[Group] AS [Group],
[Limit1].[Value] AS [Value]
FROM (SELECT DISTINCT
[Extent1].[Group] AS [Group]
FROM [dbo].[Entity] AS [Extent1] ) AS [Distinct1]
CROSS APPLY (SELECT TOP (3) [Project2].[Id] AS [Id], [Project2].[Group] AS [Group], [Project2].[Value] AS [Value]
FROM ( SELECT
[Extent2].[Id] AS [Id],
[Extent2].[Group] AS [Group],
[Extent2].[Value] AS [Value]
FROM [dbo].[Entity] AS [Extent2]
WHERE [Distinct1].[Group] = [Extent2].[Group]
) AS [Project2]
ORDER BY [Project2].[Value] ASC ) AS [Limit1]
I can't say if it's good or bad translation, but at least it is some translation. The important is that EF Core currently (latest 2.2.3 at the time of writing) cannot translate it to SQL and will use client evaluation (as you mentioned).
So currently there seem to be only 3 translatable LINQ ways of writing such query:
(1) (yours)
var query = source.Where(e => source.Count(
e2 => e2.Group == e.Group && e2.Value.CompareTo(e.Value) < 0) < 3);
translates to
SELECT [e].[Id], [e].[Group], [e].[Value]
FROM [Entity] AS [e]
WHERE (
SELECT COUNT(*)
FROM [Entity] AS [e2]
WHERE ([e2].[Group] = [e].[Group]) AND [e2].[Value] < [e].[Value]
) < 3
(2)
var query = source.Where(e => source.Where(e2 => e2.Group == e.Group)
.OrderBy(e2 => e2.Value).Take(3).Contains(e));
translates to
SELECT [e].[Id], [e].[Group], [e].[Value]
FROM [Entity] AS [e]
WHERE [e].[Id] IN (
SELECT TOP(3) [e2].[Id]
FROM [Entity] AS [e2]
WHERE [e2].[Group] = [e].[Group]
ORDER BY [e2].[Value]
)
(3)
var query = source.SelectMany(e => source.Where(e2 => e2.Group == e.Group)
.OrderBy(e2 => e2.Value).Take(3).Where(e2 => e2.Id == e.Id));
translates to
SELECT [t].[Id], [t].[Group], [t].[Value]
FROM [Entity] AS [e]
CROSS APPLY (
SELECT TOP(3) [e2].[Id], [e2].[Group], [e2].[Value]
FROM [Entity] AS [e2]
WHERE [e2].[Group] = [e].[Group]
ORDER BY [e2].[Value]
) AS [t]
WHERE [t].[Id] = [e].[Id]
I can't say which one to choose - you have to measure the execution plans.
The main drawback of #1 the comparison operator (as it can be seen in the example - can't use <
for string
s, for Guid
s it's even worse), and also won't work correctly if the Value
is not unique inside the grouping.
From the other side it might be the fastest of the three. But it's possible the execution plan for #2 and #3 (and even #1) to be the same.
With that being said, I'm not going to provide a generalized method, since all these approaches require different parameters, with only common eventually being the group selector Expression<Func<T, TGroupKey>>
(e.g. e => e.Group
). But (especially for #2 and #3) it's possible to write such a method - it would need some manual Expression
manipulations, and overall I'm not sure it does worth the efforts
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With