I've got a large query where a simple subquery optimization dropped it from 8 minutes down to 20 seconds. I'm not sure I understand why the optimization had such a drastic effect.
In essence, here's the problem part:
SELECT (bunch of stuff)
FROM
a LEFT OUTER JOIN b ON a.ID = b.a
LEFT OUTER JOIN c ON b.ID = c.b
...
...
INNER JOIN veryLargeTable
ON a.ID = veryLargeTable.a
AND veryLargeTable.PetID =
(SELECT id from Pets WHERE Pets.Name = 'Something') /* BAD! */
...
...
In all, there are 16 joined tables. If I replace the second predicate of the veryLargeTable
join with a pre-populated variable containing the petID (instead of using the subquery) the entire query speeds up dramatically:
AND veryLargeTable.PetID = @petID /* Awesome! */
Obviously, (SELECT id from Pets WHERE Name = 'Something')
is being executed for every row. There are two things I don't fully understand:
As far as I can tell, this is a non-correlated subquery. The Pets table is not part of the outer query at all. Aren't non-correlated subqueries independently evaluated (and hence optimized)? Why isn't this the case here?
The execution plans are dramatically different. In the failure case (above), the entire subtree deals with an estimated 950k rows. In the win case (using a variable instead of a subquery), there's only about 125k estimated rows. What's going on? Why are so many more rows involved if that subquery is there? The Pets.Name column definitely has unique data (but no unique constraint as far as I can tell).
Note that moving the predicate to the WHERE clause doesn't affect the query in either case, as I would expect, since it's an INNER JOIN.
Insights appreciated!
It has been my experience that, the more complex your queries get, the less able the SQL optimizer is to create deft plans. Here you've got 16 joins, some or most are outer joins, you've got at least one subquery... toss in enough indexes, cardinalities, views, outer applies, and who knows what else and no one, not even Microsoft engineers*, can figure out routines that will uniformly and regularly generate The most optimal plans.
What you've described, I've experienced numerous times -- change one simple thing in a messy query and everything's an order of magnitude faster (or, gnashes teeth, slower). I have no method for determining when complex is too complex, it's more a feeling than anything else. My general rule of thumb is, if it looks too long or too complex, simplify where you can--such as your pre-selected single nested value, or breaking out part of the query than will always run fast with a small result set, and running it first and storing the results in a temp table.
( * Please note that this is mild sarcsam)
As an alternative, I think you could eliminate the sub-query with:
...
INNER JOIN veryLargeTable vLT
ON a.ID = vLT.a
INNER JOIN Pets p
ON vLT.PetID = p.id
and p.Name = 'Something'
...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With