I have a parent-child table relationship. In a repository, I'm doing this:
return (from p in _ctx.Parents
.Include( "Children" )
select p).AsQueryable<Parent>();
Then in a filter, I want to filter the parent by a list of child ids:
IQueryable<Parent> qry; // from above
List<int> ids; // huge list (8500)
var filtered =
from p in qry.Where( p => p.Children.Any(c => ids.Contains(c.ChildId)) ) select s;
My list of ids is huge. This generates a simple SQL statement that does have a huge list of ids "in (1,2,3...)", but it takes no appreciable time to run by itself. EF, however, takes about a full minute just to generate the statement. I proved this by setting a breakpoint and calling:
((ObjectQuery<Parent>)filtered).ToTraceString();
This takes all the time. Is the problem in my last linq statement? I don't know any other way to do the equivalent of Child.ChildId in (ids). And even if my linq statement is bad, how in the world should this take so long?
Unfortunately, building queries in Linq to Entities is a pretty heavy hit, but I've found it usually saves time due to the ability to build queries from their component pieces before actually hitting the database.
It is likely that the way they implement the Contains method uses an algorithm that assumes that Contains is generally used for a relatively small set of data. According to my tests, the amount of time it takes per ID in the list begins to skyrocket at around 8000.
So it might help to break your query into pieces. Group them into groups of 1000 or less, and concatenate a bunch of Where
expressions.
var idGroups = ids.GroupBy(i => i / 1000);
var q = Parents.Include("Children").AsQueryable();
var newQ = idGroups.Aggregate(q,
(s, g) => s.Concat(
q.Where(w => w.Children.Any(wi => g.Contains(wi.ChildId)))));
This speeds things up significantly, but it might not be significantly enough for your purposes, in which case you'll have to resort to a stored procedure. Unfortunately, this particular use case just doesn't fit into "the box" of expected Entity Framework behavior. If your list of ids could begin as a query from the same Entity Context, Entity Framework would have worked fine.
Re-write your query in Lambda syntax and it will cut the time by as much as 3 seconds (or at least it did for my EF project).
return _ctx.Parents.Include( "Children" ).AsQueryable<Parent>();
and
IQueryable<Parent> qry; // from above
List<int> ids; // huge list (8500)
var filtered = qry.Where( p => p.Children.Any(c => ids.Contains(c.ChildId)) );
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With