I initially had a method that contained a LINQ query returning int[]
, which then got used later in a fashion similar to:
int[] result = something.Where(s => previousarray.Contains(s.field));
This turned out to be horribly slow, until the first array was retrieved as the native IQueryable<int>
. It now runs very quickly, but I'm wondering how I'd deal with the situation if I was provided an int[]
from elsewhere which then had to be used as above.
Is there a way to speed up the query in such cases? Converting to a List doesn't seem to help.
In LINQ-SQL, a Contains
will be converted to a SELECT ... WHERE field IN(...)
and should be relatively fast. In LINQ-Objects however, it will call ICollection<T>.Contains
if the source is an ICollection<T>
.
When a LINQ-SQL result is treated as an IEnumerable
instead of an IQueryable
, you lose the linq provider - i.e., any further operations will be done in memory and not in the database.
As for why its much slower in memory:
Array.Contains()
is an O(n) operation so
something.Where(s => previousarray.Contains(s.field));
is O(p * s) where p is the size of previousarray
and s is the size of something
.
HashSet<T>.Contains()
on the other hand is an O(1) operation. If you first create a hashset, you will see a big improvement on the .Contains
operation as it will be O(s) instead of O(p * s).
Example:
var previousSet = new HashSet<int>(previousarray);
var result = something.Where(s => previousSet.Contains(s.field));
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With