Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to fix super slow EF/LINQ query executing multiple SQL statements

I have the following code, which is misbehaving:

TPM_USER user = UserManager.GetUser(context, UserId);
var tasks = (from t in user.TPM_TASK
             where t.STAGEID > 0 && t.STAGEID != 3 && t.TPM_PROJECTVERSION.STAGEID <= 10
             orderby t.DUEDATE, t.PROJECTID
             select t);

The first line, UserManager.GetUser just does a simple lookup in the database to get the correct TPM_USER record. However, the second line causes all sorts of SQL chaos.

First off, it's executing two SQL statements here. The first one grabs every single row in TPM_TASK which is linked to that user, which is sometimes tens of thousands of rows:

SELECT 
 -- Columns
 FROM  TPMDBO.TPM_USERTASKS "Extent1"
 INNER JOIN TPMDBO.TPM_TASK "Extent2" ON "Extent1".TASKID = "Extent2".TASKID
 WHERE "Extent1".USERID = :EntityKeyValue1

This query takes about 18 seconds on users with lots of tasks. I would expect the WHERE clause to contain the STAGEID filters too, which would remove the majority of the rows.

Next, it seems to execute a new query for each TPM_PROJECTVERSION pair in the list above:

SELECT 
 -- Columns
 FROM TPMDBO.TPM_PROJECTVERSION "Extent1"
 WHERE ("Extent1".PROJECTID = :EntityKeyValue1) AND ("Extent1".VERSIONID = :EntityKeyValue2)

Even though this query is fast, it's executed several hundred times if the user has tasks in a whole bunch of projects.

The query I would like to generate would look something like:

SELECT 
 -- Columns
 FROM  TPMDBO.TPM_USERTASKS "Extent1"
 INNER JOIN TPMDBO.TPM_TASK "Extent2" ON "Extent1".TASKID = "Extent2".TASKID
 INNER JOIN TPMDBO.TPM_PROJECTVERSION "Extent3" ON "Extent2".PROJECTID = "Extent3".PROJECTID AND "Extent2".VERSIONID = "Extent3".VERSIONID
 WHERE "Extent1".USERID = 5 and "Extent2".STAGEID > 0 and "Extent2".STAGEID <> 3 and "Extent3".STAGEID <= 10

The query above would run in about 1 second. Normally, I could specify that JOIN using the Include method. However, this doesn't seem to work on properties. In other words, I can't do:

from t in user.TPM_TASK.Include("TPM_PROJECTVERSION")

Is there any way to optimize this LINQ statement? I'm using .NET4 and Oracle as the backend DB.

Solution:

This solution is based on Kirk's suggestions below, and works since context.TPM_USERTASK cannot be queried directly:

var tasks = (from t in context.TPM_TASK.Include("TPM_PROJECTVERSION")
             where t.TPM_USER.Any(y => y.USERID == UserId) &&
             t.STAGEID > 0 && t.STAGEID != 3 && t.TPM_PROJECTVERSION.STAGEID <= 10
             orderby t.DUEDATE, t.PROJECTID
             select t);

It does result in a nested SELECT rather than querying TPM_USERTASK directly, but it seems fairly efficient none-the-less.

like image 454
Mike Christensen Avatar asked Oct 08 '22 11:10

Mike Christensen


1 Answers

Yes, you are pulling down a specific user, and then referencing the relationship TPM_TASK. That it is pulling down every task attached to that user is exactly what it's supposed to be doing. There's no ORM SQL translation when you're doing it this way. You're getting a user, then getting all his tasks into memory, and then performing some client-side filtering. This is all done using lazy-loading, so the SQL is going to be exceptionally inefficient as it can't batch anything up.

Instead, rewrite your query to go directly against TPM_TASK and filter against the user:

var tasks = (from t in context.TPM_TASK
         where t.USERID == user.UserId && t.STAGEID > 0 && t.STAGEID != 3 && t.TPM_PROJECTVERSION.STAGEID <= 10
         orderby t.DUEDATE, t.PROJECTID
         select t);

Note how we're checking t.USERID == user.UserId. This produces the same effect as user.TPM_TASK but now all the heavy lifting is done by the database rather than in memory.

like image 168
Kirk Woll Avatar answered Oct 13 '22 11:10

Kirk Woll