Trying to refactor some code that has gotten really slow recently and I came across a code block that is taking 5+ seconds to execute.
The code consists of 2 statements:
IEnumerable<int> StudentIds = _entities.Filters
.Where(x => x.TeacherId == Profile.TeacherId.Value && x.StudentId != null)
.Select(x => x.StudentId)
.Distinct<int>();
and
_entities.StudentClassrooms
.Include("ClassroomTerm.Classroom.School.District")
.Include("ClassroomTerm.Teacher.Profile")
.Include("Student")
.Where(x => StudentIds.Contains(x.StudentId)
&& x.ClassroomTerm.IsActive
&& x.ClassroomTerm.Classroom.IsActive
&& x.ClassroomTerm.Classroom.School.IsActive
&& x.ClassroomTerm.Classroom.School.District.IsActive).AsQueryable<StudentClassroom>();
So it's a bit messy but first I get a Distinct list of Id's from one Table (Filters), then I query another Table using it.
These are relatively small tables, but it's still 5+ seconds of query time.
I put this in LINQPad and it showed that it was doing the bottom query first then running 1000 "distinct" queries afterwards.
On a whim I changed the "StudentIds" code by just adding .ToArray() at the end. This improved the speed 1000x ... it now takes like 100ms to complete the same query.
What's the deal? What am I doing wrong?
This is one of the pitfalls of deferred execution in Linq: In your first approach StudentIds
is really an IQueryable
, not an in-memory collection. That means using it in the second query will run the query again on the database - each and every time.
Forcing execution of the first query by using ToArray()
makes StudentIds
an in-memory collection and the Contains
part in your second query will run over this collection that contains a fixed sequence of items - This gets mapped to something equivalent to a SQL where StudentId in (1,2,3,4)
query.
This query will of course, be much much faster since you determined this sequence once up-front, and not every time the Where
clause is executed. Your second query without using ToArray()
(I would think) would be mapped to a SQL query with an where exists (...)
sub-query that gets evaluated for each row.
ToArray()
Materializes the initial query to the server memory.
My guess would be the query provider is not able to parse the expression StudentIds.Contains(x.StudentId)
. Hence it probably thinks that the studentIds
is an array already loaded to memory. So it's probably querying the database over and over again during the parsing phase. The only way to know for sure is to setup the profiler.
If you need to do this on the db server, use a join, instead of "contains". If you need to use contains to do what looks like a join problem, you are likely to be missing a surrogate primary key or a foreign key somewhere.
You could also declare studentIds
as IQueryable instead of IEnumerable. This might give the query provider the hint it needs to interpret the studentIds
as expression aka. data not already loaded to memory. I somehow doubt this but worth a try.
If all else fails, use ToArray()
. This will load the initial studentIds
to memory.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With