Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

LINQ to Entities equivalent of sql "TOP(n) WITH TIES"

I have been searcing for LINQ equivalent of WITH TIES in sql server lately, I came across a couple things, which couldn't proove to be useful.

I know this question was asked before and has an accepted answer, but it doesn't work the way with ties does. The solution using GroupBy() doesn't result as expected for TOP(3) WITH TIES considering a data set consisting of {3 2 2 1 1 0} the result set will be {3 2 2 1 1} where it should be {3 2 2}

Using the following sample data (taken from this question):

CREATE TABLE Person
(
    Id int primary key,
    Name nvarchar(50),
    Score float
)    

INSERT INTO Person VALUES (1, 'Tom',8.9)
INSERT INTO Person VALUES (2, 'Jerry',8.9)
INSERT INTO Person VALUES (3, 'Sharti',7)
INSERT INTO Person VALUES (4, 'Mamuzi',9)
INSERT INTO Person VALUES (5, 'Kamala',9)

Traditional OrderByDescending(p => p.Score).Take(3) will result with: Mamuzi, Kamala and one of Tom (or Jerry) where it should include BOTH

I know there is no built-in equivalent of it and i've found a way to implement it. I don't know if it is the best way to do it and open for alternative solutions.

like image 638
Saro Taşciyan Avatar asked Mar 03 '14 20:03

Saro Taşciyan


3 Answers

var query = (from q in list.OrderByDescending(s => s.Score).Take(3).Select(s => s.Score).Distinct()
             from i in list
             where q == i.Score
             select i).ToList();

Edit:

@Zefnus

I wasn't sure in which order you wanted it but to change the order you can put a OrderBy(s => s.Score) between select i and ToList()

I don't have the possibility to check what sql statement my linq clause would produce. But your answer is much better i think. And your question was also really good. I never thought about top with ties in linq. ;)

Basically it only takes top 3 scores from the first list and compares them with the whole list and i takes only those scores which are equal to the scores of the first list.

like image 192
Pumpkin Avatar answered Sep 29 '22 21:09

Pumpkin


Do not use IEnumerable<T> with anything touching a database!

A solution aimed at LinqToSql and LinqToEntities should not be using IEnumerable<T>. Your current self answer will result in every single person being selected from the database and then being queried in memory using LinqToObjects.

To make a solution that is translated to SQL and executed by the database you have to use IQueryable<T> and Expressions instead.

public static class QueryableExtensions
{
    public static IQueryable<T> TopWithTies<T, TComparand>(this IQueryable<T> source, Expression<Func<T, TComparand>> topBy, int topCount)
    {
        if (source == null) throw new ArgumentNullException("source");
        if (topBy == null) throw new ArgumentNullException("topBy");
        if (topCount < 1) throw new ArgumentOutOfRangeException("topCount", string.Format("topCount must be greater than 0, was {0}", topCount));

        var topValues = source.OrderBy(topBy)
                              .Select(topBy)
                              .Take(topCount);

        var queryableMaxMethod = typeof(Queryable).GetMethods()
                                                  .Single(mi => mi.Name == "Max" &&
                                                                mi.GetParameters().Length == 1 &&
                                                                mi.IsGenericMethod)
                                                  .MakeGenericMethod(typeof(TComparand));

        var lessThanOrEqualToMaxTopValue = Expression.Lambda<Func<T, bool>>(
            Expression.LessThanOrEqual(
                topBy.Body,
                Expression.Call(
                    queryableMaxMethod,
                    topValues.Expression)),
            new[] { topBy.Parameters.Single() });

        var topNRowsWithTies = source.Where(lessThanOrEqualToMaxTopValue)
                                     .OrderBy(topBy);
        return topNRowsWithTies;
    }

    public static IQueryable<T> TopWithTiesDescending<T, TComparand>(this IQueryable<T> source, Expression<Func<T, TComparand>> topBy, int topCount)
    {
        if (source == null) throw new ArgumentNullException("source");
        if (topBy == null) throw new ArgumentNullException("topBy");
        if (topCount < 1) throw new ArgumentOutOfRangeException("topCount", string.Format("topCount must be greater than 0, was {0}", topCount));

        var topValues = source.OrderByDescending(topBy)
                              .Select(topBy)
                              .Take(topCount);

        var queryableMinMethod = typeof(Queryable).GetMethods()
                                                  .Single(mi => mi.Name == "Min" &&
                                                                mi.GetParameters().Length == 1 &&
                                                                mi.IsGenericMethod)
                                                  .MakeGenericMethod(typeof(TComparand));

        var greaterThanOrEqualToMinTopValue = Expression.Lambda<Func<T, bool>>(
            Expression.GreaterThanOrEqual(
                topBy.Body,
                Expression.Call(queryableMinMethod,
                                topValues.Expression)),
            new[] { topBy.Parameters.Single() });

        var topNRowsWithTies = source.Where(greaterThanOrEqualToMinTopValue)
                                     .OrderByDescending(topBy);
        return topNRowsWithTies;
    }
}

This creates queries of the following form:

SELECT [t0].[Id], [t0].[Name], [t0].[Score]
FROM [Person] AS [t0]
WHERE [t0].[Score] >= ((
    SELECT MIN([t2].[Score])
    FROM (
        SELECT TOP (3) [t1].[Score]
        FROM [Person] AS [t1]
        ORDER BY [t1].[Score] DESC
        ) AS [t2]
    ))
ORDER BY [t0].[Score] DESC

That query is only about 50% worse than the baseline query:

SELECT TOP (3) WITH TIES
    [t0].[Id], 
    [t0].[Name], 
    [t0].[Score]
FROM 
    [Person] AS [t0]
ORDER BY [t0].[Score] desc

With a data set consisting of your original 5 records and an additional 10000 records all with scores less than the original both of these are more or less instant (less than 20 milliseconds).

The IEnumerable<T> approach took a whole 2 minutes!


If the expression building and reflection seems scary the same thing can be achieved with a join:

public static IQueryable<T> TopWithTiesDescendingJoin<T, TComparand>(this IQueryable<T> source, Expression<Func<T, TComparand>> topBy, int topCount)
{
    if (source == null) throw new ArgumentNullException("source");
    if (topBy == null) throw new ArgumentNullException("topBy");
    if (topCount < 1) throw new ArgumentOutOfRangeException("topCount", string.Format("topCount must be greater than 0, was {0}", topCount));

    var orderedByValue = source.OrderByDescending(topBy);
    var topNValues = orderedByValue.Select(topBy).Take(topCount).Distinct();
    var topNRowsWithTies = topNValues.Join(source, value => value, topBy, (x, row) => row);
    return topNRowsWithTies.OrderByDescending(topBy);
}

With the following query as the result (with about the same performance):

SELECT [t3].[Id], [t3].[Name], [t3].[Score]
FROM (
    SELECT DISTINCT [t1].[Score]
    FROM (
        SELECT TOP (3) [t0].[Score]
        FROM [Person] AS [t0]
        ORDER BY [t0].[Score] DESC
        ) AS [t1]
    ) AS [t2]
INNER JOIN [Person] AS [t3] ON [t2].[Score] = [t3].[Score]
ORDER BY [t3].[Score] DESC
like image 24
Johnbot Avatar answered Sep 29 '22 20:09

Johnbot


Another solution - which probably is not as efficient as the other solution - is to get TOP(3) Scores and get the rows with Score values contained in the TOP(3).

We can use Contains() as follows;

orderedPerson = datamodel.People.OrderByDescending(p => p.Score);

topPeopleList =
(
    from p in orderedPerson 
    let topNPersonScores = orderedPerson.Take(n).Select(p => p.Score).Distinct()
    where topNPersonScores.Contains(p.Score)
    select p
).ToList();

What's good about this implementation is that it's extension method TopWithTies() can be implemented easly as;

public static IEnumerable<T> TopWithTies<T, TResult>(this IEnumerable<T> enumerable, Func<T, TResult> selector, int n)
{
    IEnumerable<T> orderedEnumerable = enumerable.OrderByDescending(selector);

    return
    (
        from p in orderedEnumerable
        let topNValues = orderedEnumerable.Take(n).Select(selector).Distinct()
        where topNValues.Contains(selector(p))
        select p
    );
}
like image 38
Saro Taşciyan Avatar answered Sep 29 '22 21:09

Saro Taşciyan