Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

LINQ with Querying "Memory"

Tags:

c#

.net

linq

Does LINQ have a way to "memorize" its previous query results while querying?

Consider the following case:

public class Foo {
    public int Id { get; set; }
    public ICollection<Bar> Bars { get; set; }
}

public class Bar {
    public int Id { get; set; }
}

Now, if two or more Foo have same collection of Bar (no matter what the order is), they are considered as similar Foo.

Example:

foo1.Bars = new List<Bar>() { bar1, bar2 };
foo2.Bars = new List<Bar>() { bar2, bar1 };
foo3.Bars = new List<Bar>() { bar3, bar1, bar2 };

In the above case, foo1 is similar to foo2 but both foo1 and foo2 are not similar tofoo3

Given that we have a query result consisting IEnumerable or IOrderedEnumerable of Foo. From the query, we are to find the first N foo which are not similar.

This task seems to require a memory of the collection of bars which have been chosen before.

With partial LINQ we could do it like this:

private bool areBarsSimilar(ICollection<Bar> bars1, ICollection<Bar> bars2) {
    return bars1.Count == bars2.Count && //have the same amount of bars
        !bars1.Select(x => x.Id)
        .Except(bars2.Select(y => y.Id))
        .Any(); //and when excepted does not return any element mean similar bar
}

public void somewhereWithQueryResult(){
    .
    .
    List<Foo> topNFoos = new List<Foo>(); //this serves as a memory for the previous query
    int N = 50; //can be any number
    foreach (var q in query) { //query is IOrderedEnumerable or IEnumerable
        if (topNFoos.Count == 0 || !topNFoos.Any(foo => areBarsSimilar(foo.Bars, q.Bars)))
            topNFoos.Add(q);
        if (topNFoos.Count >= N) //We have had enough Foo
            break;
    }
}

The topNFoos List will serve as a memory of the previous query and we can skip the Foo q in the foreach loop which already have identical Bars with Any of the Foo in the topNFoos.

My question is, is there any way to do that in LINQ (fully LINQ)?

var topNFoos = from q in query
               //put something
               select q;

If the "memory" required is from a particular query item q or a variable outside of the query, then we could use let variable to cache it:

int index = 0;
var topNFoos = from q in query
               let qc = index++ + q.Id //depends on q or variable outside like index, then it is OK
               select q;

But if it must come from the previous querying of the query itself then things start to get more troublesome.

Is there any way to do that?


Edit:

(I currently am creating a test case (github link) for the answers. Still figuring out how can I test all the answers fairly)

(Most of the answers below are aimed to solve my particular question and are in themselves good (Rob's, spender's, and David B's answers which use IEqualityComparer are particularly awesome). Nevertheless, if there is anyone who can give answer to my more general question "does LINQ have a way to "memorize" its previous query results while querying", I would also be glad)

(Apart from the significant difference in performance for the particular case I presented above when using fully/partial LINQ, one answer aiming to answer my general question about LINQ memory is Ivan Stoev's. Another one with good combination is Rob's. As to make myself clearer, I look for general and efficient solution, if there is any, using LINQ)

like image 763
Ian Avatar asked Mar 10 '16 02:03

Ian


2 Answers

I'm not going to answer your question directly, but rather, propose a method that will be fairly optimally efficient for filtering the first N non-similar items.

First, consider writing an IEqualityComparer<Foo> that uses the Bars collection to measure equality. Here, I'm assuming that the lists might contain duplicate entries, so have quite a strict definition of similarity:

public class FooSimilarityComparer:IEqualityComparer<Foo>
{
    public bool Equals(Foo a, Foo b)
    {
        //called infrequently
        return a.Bars.OrderBy(bar => bar.Id).SequenceEqual(b.Bars.OrderBy(bar => bar.Id));
    }
    public int GetHashCode(Foo foo)
    {
        //called frequently
        unchecked
        {
            return foo.Bars.Sum(b => b.GetHashCode());
        }
    }
}

You can really efficiently get the top N non-similar items by using a HashSet with the IEqualityComparer above:

IEnumerable<Foo> someFoos; //= some list of Foo
var hs = new HashSet<Foo>(new FooSimilarityComparer());
foreach(var f in someFoos)
{
    hs.Add(f); //hashsets don't add duplicates, as measured by the FooSimilarityComparer
    if(hs.Count >= 50)
    {
        break;
    }
}

@Rob s approach above is broadly similar, and shows how you can use the comparer directly in LINQ, but pay attention to the comments I made to his answer.

like image 69
spender Avatar answered Sep 30 '22 20:09

spender


So, it's ... possible. But this is far from performant code.

var res = query.Select(q => new {
    original = q, 
    matches = query.Where(innerQ => areBarsSimilar(q.Bars, innerQ.Bars))
}).Select(g => new { original = g, joinKey = string.Join(",", g.matches.Select(m => m.Id)) })
.GroupBy (g => g.joinKey)
.Select(g => g.First().original.original)
.Take(N);

This assumes that the Ids are unique for each Foo (you could also use their GetHashCode(), I suppose).

A much better solution is to either keep what you've done, or implement a custom comparer, as follows:


Note: As pointed out in the comments by @spender, the below Equals and GetHashCode will not work for collections with duplicates. Refer to their answer for a better implementation - however, the usage code would remain the same
class MyComparer : IEqualityComparer<Foo>
{
    public bool Equals(Foo left, Foo right)
    {
        return left.Bars.Count() == right.Bars.Count() && //have the same amount of bars
            left.Bars.Select(x => x.Id)
            .Except(right.Bars.Select(y => y.Id))
            .ToList().Count == 0; //and when excepted returns 0, mean similar bar
    }

    public int GetHashCode(Foo foo)
    {
        unchecked {
            int hc = 0;
            if (foo.Bars != null)
                foreach (var p in foo.Bars)
                hc ^= p.GetHashCode();
            return hc;
        }
    }
}

And then your query becomes simply:

var res = query
    .GroupBy (q => q, new MyComparer())
    .Select(g => g.First())
    .Take(N);
like image 28
Rob Avatar answered Sep 30 '22 20:09

Rob