Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does a SQL join choose a sub-optimal query plan?

Tags:

sql

sql-server

Ok so I realize that this is a pretty vague question, but bear with me.

I have experienced this problem on numerous occasions with different and unrelated queries. The query below takes many minutes to execute:

SELECT <Fields>
FROM <Multiple Tables Joined>
    LEFT JOIN (SELECT <Fields> FROM <Multiple Tables Joined> ) ON <Condition>

However, by just adding the join hint it query the executes in just seconds:

SELECT <Fields>
FROM <Multiple Tables Joined>
    LEFT HASH JOIN (SELECT <Fields> FROM <Multiple Tables Joined> ) ON <Condition>

The strange thing is the type of JOIN specified in the hint is not really what improves the performance. It appears to be because the hint causes the optimizer to execute the sub query in isolation and then join. I see the same performance improvement if I create a table-valued function (not an inline one) for the sub-query. e.g.

SELECT <Fields>
FROM <Multiple Tables Joined>
    LEFT JOIN dbo.MySubQueryFunction() ON <Condition>

Anybody have any ideas why the optimizer is so dumb in this case?

like image 937
Darrel Miller Avatar asked Feb 06 '09 21:02

Darrel Miller


People also ask

What is sub optimal plan?

If a query execution plan is believed to be sub- optimal, it dynamically changes the execution plan of the remainder of the query (the part that hasn't been executed yet) leading to an improvement in performance.

How does join affect query performance?

The order in which the tables in your queries are joined can have a dramatic effect on how the query performs. If your query happens to join all the large tables first and then joins to a smaller table later this can cause a lot of unnecessary processing by the SQL engine.

Does join order affect query performance?

Join order in SQL2008R2 server does unquestionably affect query performance, particularly in queries where there are a large number of table joins with where clauses applied against multiple tables. Although the join order is changed in optimisation, the optimiser does't try all possible join orders.


1 Answers

If any of those tables are table variables, the optimizer uses a bad estimate of 0 rows and usually chooses nested loop as the join technique.

It does this due to a lack of statistics on the tables involved.

like image 80
Amy B Avatar answered Sep 29 '22 06:09

Amy B