Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Linq performance: Any vs. Contains [duplicate]

This question is related to this one, but not entirely the same I think.

Given:

class Foo
{
  public string Bar { get; set; }
}
...
var c1 = new List<Foo>() { ... };
var c2 = new List<Foo>() { ... };

The following 2 loops give the same result:

  foreach (var item in c2.Where(f => c1.Any(f1 => f1.Bar.Equals(f.Bar))))
  { ... }

  foreach (var item in c2.Where(f => c1.Select(f1 => f1.Bar).Contains(f.Bar)))
  { ... }

Are they equally fast?

The difference with the other question, is whether the extra Select statement here, changes the importance of the nature of the underlying collection.

In other words: does this Contains:

foos.Contains(foo1)

act on the same "kind of collection" as this one:

foos.Select(f=>f.Bar).Contains(foo1.Bar)

My possible -naive- thought could be: "Once we're behind Linq's Select, everything is just 'Lists', so both the Any and Contains are O(n)."

like image 588
jan Avatar asked Jun 25 '13 16:06

jan


1 Answers

Both the queries are fundamentally implementing the same algorithm. They will each iterate c1 for each item in c2, compare the Bar properties of the two objects, and return as soon as a match is found. The asymptotic complexity of the two cases is the same, meaning they'll both scale equally well (or equally bad, as the case happens to be) as the size of the two sets increase.

There may be a minor difference between the two in the overhead associated with one method over the other, but the difference will not be significant, and they will be smaller and smaller as the size of the collections increase. There isn't any real performance reason to pick one of the two over the other.

There is an option that you didn't show that is quite a bit faster than either of those. You can use a Join to find all of the items in c1 that also exist in c2 without doing a linear search through the sequence:

var query = from first in c1
    join second in c2
    on first.Bar equals second.Bar
    select first;

Another option would be to use a HashSet instead of a List, as that can be much more easily searched:

var set = new HashSet<string>(c1.Select(item => item.Bar));

var query = c2.Where(item => set.Contains(item.Bar));

(This solution is pretty close to what Join will do internally.)

Both of these solutions will be a lot faster than either of your proposed solutions.

like image 145
Servy Avatar answered Sep 22 '22 13:09

Servy