Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is this (non-correlated) subquery causing such problems?

I've got a large query where a simple subquery optimization dropped it from 8 minutes down to 20 seconds. I'm not sure I understand why the optimization had such a drastic effect.

In essence, here's the problem part:

SELECT  (bunch of stuff)
FROM
  a LEFT OUTER JOIN b ON a.ID = b.a
  LEFT OUTER JOIN c ON b.ID = c.b
  ...
  ...
      INNER JOIN veryLargeTable 
      ON a.ID = veryLargeTable.a 
         AND veryLargeTable.PetID = 
             (SELECT id from Pets WHERE Pets.Name = 'Something')    /* BAD! */
  ...
  ...

In all, there are 16 joined tables. If I replace the second predicate of the veryLargeTable join with a pre-populated variable containing the petID (instead of using the subquery) the entire query speeds up dramatically:

AND veryLargeTable.PetID = @petID   /* Awesome! */


Obviously, (SELECT id from Pets WHERE Name = 'Something') is being executed for every row. There are two things I don't fully understand:

  1. As far as I can tell, this is a non-correlated subquery. The Pets table is not part of the outer query at all. Aren't non-correlated subqueries independently evaluated (and hence optimized)? Why isn't this the case here?

  2. The execution plans are dramatically different. In the failure case (above), the entire subtree deals with an estimated 950k rows. In the win case (using a variable instead of a subquery), there's only about 125k estimated rows. What's going on? Why are so many more rows involved if that subquery is there? The Pets.Name column definitely has unique data (but no unique constraint as far as I can tell).

Note that moving the predicate to the WHERE clause doesn't affect the query in either case, as I would expect, since it's an INNER JOIN.

Insights appreciated!

like image 589
womp Avatar asked Jan 21 '23 13:01

womp


2 Answers

It has been my experience that, the more complex your queries get, the less able the SQL optimizer is to create deft plans. Here you've got 16 joins, some or most are outer joins, you've got at least one subquery... toss in enough indexes, cardinalities, views, outer applies, and who knows what else and no one, not even Microsoft engineers*, can figure out routines that will uniformly and regularly generate The most optimal plans.

What you've described, I've experienced numerous times -- change one simple thing in a messy query and everything's an order of magnitude faster (or, gnashes teeth, slower). I have no method for determining when complex is too complex, it's more a feeling than anything else. My general rule of thumb is, if it looks too long or too complex, simplify where you can--such as your pre-selected single nested value, or breaking out part of the query than will always run fast with a small result set, and running it first and storing the results in a temp table.

( * Please note that this is mild sarcsam)

like image 130
Philip Kelley Avatar answered Feb 01 '23 08:02

Philip Kelley


As an alternative, I think you could eliminate the sub-query with:

...
INNER JOIN veryLargeTable vLT
    ON a.ID = vLT.a 
INNER JOIN Pets p
    ON vLT.PetID = p.id
        and p.Name = 'Something'
...
like image 39
Joe Stefanelli Avatar answered Feb 01 '23 07:02

Joe Stefanelli