Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Speed improvement in LINQ Where(Array.Contains)

Tags:

arrays

c#

linq

I initially had a method that contained a LINQ query returning int[], which then got used later in a fashion similar to:

int[] result = something.Where(s => previousarray.Contains(s.field));

This turned out to be horribly slow, until the first array was retrieved as the native IQueryable<int>. It now runs very quickly, but I'm wondering how I'd deal with the situation if I was provided an int[] from elsewhere which then had to be used as above.

Is there a way to speed up the query in such cases? Converting to a List doesn't seem to help.

like image 748
Dan Avatar asked Dec 03 '22 21:12

Dan


1 Answers

In LINQ-SQL, a Contains will be converted to a SELECT ... WHERE field IN(...) and should be relatively fast. In LINQ-Objects however, it will call ICollection<T>.Contains if the source is an ICollection<T>.

When a LINQ-SQL result is treated as an IEnumerable instead of an IQueryable, you lose the linq provider - i.e., any further operations will be done in memory and not in the database.

As for why its much slower in memory:

Array.Contains() is an O(n) operation so

something.Where(s => previousarray.Contains(s.field));

is O(p * s) where p is the size of previousarray and s is the size of something.

HashSet<T>.Contains() on the other hand is an O(1) operation. If you first create a hashset, you will see a big improvement on the .Contains operation as it will be O(s) instead of O(p * s).

Example:

var previousSet = new HashSet<int>(previousarray);
var result = something.Where(s => previousSet.Contains(s.field));
like image 166
drch Avatar answered Dec 05 '22 10:12

drch