Does LINQ
have a way to "memorize" its previous query results while querying?
Consider the following case:
public class Foo {
public int Id { get; set; }
public ICollection<Bar> Bars { get; set; }
}
public class Bar {
public int Id { get; set; }
}
Now, if two or more Foo
have same collection of Bar
(no matter what the order is), they are considered as similar Foo
.
Example:
foo1.Bars = new List<Bar>() { bar1, bar2 };
foo2.Bars = new List<Bar>() { bar2, bar1 };
foo3.Bars = new List<Bar>() { bar3, bar1, bar2 };
In the above case, foo1
is similar to foo2
but both foo1
and foo2
are not similar tofoo3
Given that we have a query
result consisting IEnumerable
or IOrderedEnumerable
of Foo
. From the query
, we are to find the first N
foo
which are not similar.
This task seems to require a memory of the collection of bars
which have been chosen before.
With partial LINQ
we could do it like this:
private bool areBarsSimilar(ICollection<Bar> bars1, ICollection<Bar> bars2) {
return bars1.Count == bars2.Count && //have the same amount of bars
!bars1.Select(x => x.Id)
.Except(bars2.Select(y => y.Id))
.Any(); //and when excepted does not return any element mean similar bar
}
public void somewhereWithQueryResult(){
.
.
List<Foo> topNFoos = new List<Foo>(); //this serves as a memory for the previous query
int N = 50; //can be any number
foreach (var q in query) { //query is IOrderedEnumerable or IEnumerable
if (topNFoos.Count == 0 || !topNFoos.Any(foo => areBarsSimilar(foo.Bars, q.Bars)))
topNFoos.Add(q);
if (topNFoos.Count >= N) //We have had enough Foo
break;
}
}
The topNFoos
List
will serve as a memory of the previous query and we can skip the Foo q
in the foreach
loop which already have identical Bars
with Any
of the Foo
in the topNFoos
.
My question is, is there any way to do that in LINQ
(fully LINQ
)?
var topNFoos = from q in query
//put something
select q;
If the "memory" required is from a particular query item q
or a variable outside of the query, then we could use let
variable to cache it:
int index = 0;
var topNFoos = from q in query
let qc = index++ + q.Id //depends on q or variable outside like index, then it is OK
select q;
But if it must come from the previous querying of the query itself then things start to get more troublesome.
Is there any way to do that?
Edit:
(I currently am creating a test case (github link) for the answers. Still figuring out how can I test all the answers fairly)
(Most of the answers below are aimed to solve my particular question and are in themselves good (Rob's, spender's, and David B's answers which use IEqualityComparer
are particularly awesome). Nevertheless, if there is anyone who can give answer to my more general question "does LINQ have a way to "memorize" its previous query results while querying", I would also be glad)
(Apart from the significant difference in performance for the particular case I presented above when using fully/partial LINQ, one answer aiming to answer my general question about LINQ memory is Ivan Stoev's. Another one with good combination is Rob's. As to make myself clearer, I look for general and efficient solution, if there is any, using LINQ)
I'm not going to answer your question directly, but rather, propose a method that will be fairly optimally efficient for filtering the first N non-similar items.
First, consider writing an IEqualityComparer<Foo>
that uses the Bars
collection to measure equality. Here, I'm assuming that the lists might contain duplicate entries, so have quite a strict definition of similarity:
public class FooSimilarityComparer:IEqualityComparer<Foo>
{
public bool Equals(Foo a, Foo b)
{
//called infrequently
return a.Bars.OrderBy(bar => bar.Id).SequenceEqual(b.Bars.OrderBy(bar => bar.Id));
}
public int GetHashCode(Foo foo)
{
//called frequently
unchecked
{
return foo.Bars.Sum(b => b.GetHashCode());
}
}
}
You can really efficiently get the top N
non-similar items by using a HashSet
with the IEqualityComparer above:
IEnumerable<Foo> someFoos; //= some list of Foo
var hs = new HashSet<Foo>(new FooSimilarityComparer());
foreach(var f in someFoos)
{
hs.Add(f); //hashsets don't add duplicates, as measured by the FooSimilarityComparer
if(hs.Count >= 50)
{
break;
}
}
@Rob s approach above is broadly similar, and shows how you can use the comparer directly in LINQ, but pay attention to the comments I made to his answer.
So, it's ... possible. But this is far from performant code.
var res = query.Select(q => new {
original = q,
matches = query.Where(innerQ => areBarsSimilar(q.Bars, innerQ.Bars))
}).Select(g => new { original = g, joinKey = string.Join(",", g.matches.Select(m => m.Id)) })
.GroupBy (g => g.joinKey)
.Select(g => g.First().original.original)
.Take(N);
This assumes that the Id
s are unique for each Foo (you could also use their GetHashCode()
, I suppose).
A much better solution is to either keep what you've done, or implement a custom comparer, as follows:
Equals
and GetHashCode
will not work for collections with duplicates. Refer to their answer for a better implementation - however, the usage code would remain the same
class MyComparer : IEqualityComparer<Foo>
{
public bool Equals(Foo left, Foo right)
{
return left.Bars.Count() == right.Bars.Count() && //have the same amount of bars
left.Bars.Select(x => x.Id)
.Except(right.Bars.Select(y => y.Id))
.ToList().Count == 0; //and when excepted returns 0, mean similar bar
}
public int GetHashCode(Foo foo)
{
unchecked {
int hc = 0;
if (foo.Bars != null)
foreach (var p in foo.Bars)
hc ^= p.GetHashCode();
return hc;
}
}
}
And then your query becomes simply:
var res = query
.GroupBy (q => q, new MyComparer())
.Select(g => g.First())
.Take(N);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With