Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using 'where then Union' or Using 'Union then Where'

Please have these two types of query in your mind:

--query1
Select someFields
From someTables
Where someWhereClues
Union all
Select someFields
FROM some Tables
Where someWhereClues

--query2
Select * FROM (
    Select someFields 
    From someTables
    Union all 
    Select someFields
    FROM someTables
    ) DT
Where someMixedWhereClues

Note :
In both queries final result fields are same

I thought the 1st. query is faster or its performance is better!
But after some researches I confused by this note:

SQL Server (as a sample of RDBMS) first reads whole data then seek records. => so in both queries all records will read and seek.

Please Help me on my misunderstandings, and on if there is any other differences between query1 and query2 ?


Edit: adding sample plans:

select t.Name, t.type from sys.tables t where t.type = 'U'
union all
select t.Name, t.type from sys.objects t where t.type = 'U'

select * from (
    select t.Name, t.type from sys.tables t
    union all
    select t.Name, t.type from sys.objects t
    ) dt
where dt.type = 'U'

Execution Plans are: enter image description hereenter image description here

both are same and 50%

like image 995
shA.t Avatar asked Mar 17 '15 12:03

shA.t


3 Answers

The SQL Server query optimizer, optimizes both queries so you get nearly the same performance.

like image 186
Nick N. Avatar answered Oct 18 '22 01:10

Nick N.


The first one cannot be slower. Here is the reasoning:

  • If the WHERE clauses in the first can efficiently use an INDEX, there will be fewer rows to collect together in the UNION. Fewer rows --> faster.
  • The second one does not have an INDEX on the UNION, hence the WHERE cannot be optimized in that way.

Here are things that could lead to the first being slower. But I see them as exceptions, not the rule.

  • A different amount of parallelism is achieved.
  • Different stuff happens to be cached at the time you run the queries.

Caveat: I am assuming all three WHERE clauses are identical (as your example shows).

like image 43
Rick James Avatar answered Oct 18 '22 00:10

Rick James


As a rule of thumb, I will always consider using the first type of the query.

In made-up samples and queries with simple WHERE predicates both will use the same plan. But in a more complex query, with more complicated predicates, the optimizer might not come up with an equally efficient solution for the second type of query (it's just an optimizer, and is bound by resource and time constraints). The more complex the query is, the less chance is the optimizer finds the best execution plan (as it will eventually time-out and choose the least worst plan found so far). And it gets even worse if the predicates are ORed.

like image 1
dean Avatar answered Oct 18 '22 00:10

dean