Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

WHERE clause better execute before IN and JOIN or after

Tags:

sql-server

I read this article: Logical Processing Order of the SELECT statement

in end of article has been write ON and JOIN clause consider before WHERE.

Consider we have a master table that has 10 million records and a detail table (that has reference to master table(FK)) with 50 million record. We have a query that want just 100 record of detail table according a PK in master table.

In this situation ON and JOIN execute before WHERE?I mean that do we have 500 million record after JOIN and then WHERE apply to it?or first WHERE apply and then JOIN and ON Consider? If second answer is true do it has incoherence with top article?

thanks

like image 956
Arian Avatar asked Mar 28 '11 18:03

Arian


People also ask

Does WHERE clause go before or after join?

Normally, filtering is processed in the WHERE clause once the two tables have already been joined. It's possible, though that you might want to filter one or both of the tables before joining them. For example, you only want to create matches between the tables under certain circumstances.

Can we use WHERE condition before join?

To use the WHERE clause to perform the same join as you perform using the INNER JOIN syntax, enter both the join condition and the additional selection condition in the WHERE clause. The tables to be joined are listed in the FROM clause, separated by commas. This query returns the same output as the previous example.

Which is faster join or in?

If the joining column is UNIQUE and marked as such, both these queries yield the same plan in SQL Server . If it's not, then IN is faster than JOIN on DISTINCT .

Can we use joins in WHERE clause?

You join two tables by creating a relationship in the WHERE clause between at least one column from one table and at least one column from another. The join creates a temporary composite table where each pair of rows (one from each table) that satisfies the join condition is linked to form a single row.


3 Answers

In the case of an INNER JOIN or a table on the left in a LEFT JOIN, in many cases, the optimizer will find that it is better to perform any filtering first (highest selectivity) before actually performing whatever type of physical join - so there are obviously physical order of operations which are better.

To some extent you can sometimes control this (or interfere with this) with your SQL, for instance, with aggregates in subqueries.

The logical order of processing the constraints in the query can only be transformed according to known invariant transformations.

So:

SELECT *
FROM a
INNER JOIN b
    ON a.id = b.id
WHERE a.something = something
    AND b.something = something

is still logically equivalent to:

SELECT *
FROM a
INNER JOIN b
    ON a.id = b.id
    AND a.something = something
    AND b.something = something

and they will generally have the same execution plan.

On the other hand:

SELECT *
FROM a
LEFT JOIN b
    ON a.id = b.id
WHERE a.something = something
    AND b.something = something

is NOT equivalent to:

SELECT *
FROM a
LEFT JOIN b
    ON a.id = b.id
    AND a.something = something
    AND b.something = something

and so the optimizer isn't going to transform them into the same execution plan.

The optimizer is very smart and is able to move things around pretty successfully, including collapsing views and inline table-valued functions as well as even pushing things down through certain kinds of aggregates fairly successfully.

Typically, when you write SQL, it needs to be understandable, maintainable and correct. As far as efficiency in execution, if the optimizer is having difficulty turning the declarative SQL into an execution plan with acceptable performance, the code can sometimes be simplified or appropriate indexes or hints added or broken down into steps which should perform more quickly - all in successive orders of invasiveness.

like image 154
Cade Roux Avatar answered Oct 17 '22 10:10

Cade Roux


It doesn't matter

Logical processing order is always honoured: regardless of actual processing order

INNER JOINs and WHERE conditions are effectively associative and commutative (hence the ANSI-89 "join in the where" syntax) so actual order doesn't matter

Logical order becomes important with outer joins and more complex queries: applying WHERE on an OUTER table changes the logic completely.

Again, it doesn't matter how the optimiser does it internally so long as the query semantics are maintained by following logical processing order.

And the key word here is "optimiser": it does exactly what it says

like image 38
gbn Avatar answered Oct 17 '22 08:10

gbn


Just re-reading Paul White's excellent series on the Query Optimiser and remembered this question.

It is possible to use an undocumented command to disable specific transformation rules and get some insight into the transformations applied.

For (hopefully!) obvious reasons only try this on a development instance and remember to re-enable them and remove any suboptimal plans from the cache.

USE AdventureWorks2008;

/*Disable the rules*/
DBCC RULEOFF ('SELonJN');
DBCC RULEOFF ('BuildSpool');


 SELECT  P.ProductNumber, 
         P.ProductID, 
        I.Quantity
 FROM    Production.Product P
 JOIN    Production.ProductInventory I
         ON  I.ProductID = P.ProductID
WHERE I.ProductID < 3
OPTION (RECOMPILE)

You can see with those two rules disabled it does a cartesian join and filter after.

Rules Off Plan

/*Re-enable them*/   
DBCC RULEON ('SELonJN');
DBCC RULEON ('BuildSpool');

 SELECT  P.ProductNumber, 
         P.ProductID, 
        I.Quantity
 FROM    Production.Product P
 JOIN    Production.ProductInventory I
         ON  I.ProductID = P.ProductID
WHERE I.ProductID < 3
OPTION (RECOMPILE)

With them enabled the predicate is pushed right down into the index seek and so reduces the number of rows processed by the join operation.

Rules on Plan

like image 9
Martin Smith Avatar answered Oct 17 '22 10:10

Martin Smith