Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I decide if I should use a CTE or not?

Bear with me, I'm new to intermediate.

My question is - When should I use a CTE? How do I decide if I should use a CTE?

When should I use this:

;with cteTesting as
(
    select  *
    from    testing.first_table
)
select  *
from    testing.second_table s
        inner join cteTesting t
            on s.key = t.key

Over this:

select  *
from    testing.second_table s
        inner join
        (
            select  *
            from    testing.first_table
        ) t
        on s.key = t.key

And why? Is this just for code flow, code readability - or is there something more technical? Will one yield better execution plans under some circumstances?

edit: Just realized my example code example is very poor. I was trying to highlight that there are many cases where I can use a select in the from statement instead of a CTE - how do I decide which one I should be using?

like image 528
Michael A Avatar asked Jul 12 '12 07:07

Michael A


People also ask

What is CTE and when we should use it?

A Common Table Expression (CTE) is the result set of a query which exists temporarily and for use only within the context of a larger query. Much like a derived table, the result of a CTE is not stored and exists only for the duration of the query. This article will focus on non-recurrsive CTEs.

Why would you use a CTE?

A CTE can be used to: Create a recursive query. For more information, see Recursive Queries Using Common Table Expressions. Substitute for a view when the general use of a view is not required; that is, you do not have to store the definition in metadata.

When should one use a CTE over a subquery?

CTE can be reusable: One advantage of using CTE is CTE is reusable by design. Instead of having to declare the same subquery in every place you need to use it, you can use CTE to define a temporary table once, then refer to it whenever you need it.

Is CTE better than subquery?

The performance of CTEs and subqueries should, in theory, be the same since both provide the same information to the query optimizer. One difference is that a CTE used more than once could be easily identified and calculated once. The results could then be stored and read multiple times.


3 Answers

For simple examples, it doesn't make much difference. If you need to use the Recursive features to build up a hierarchy, then you haven't much choice - you need to use a CTE.

Another case where it probably doesn't make much performance difference, but does for readability, is when you need to join the same subquery multiple times. If you're using subqueries, you have to repeat the entire expression, whereas with CTEs, you just use the name twice:

;With NamedExpression as (
    select t1.ID,t2.ID as ID2,SUM(t3.Value) as Val
    from
      Table1 t1
        left join
      Table2 t2 on t1.id = t2.t1id
         inner join
      Table3 t3 on t3.col = t1.id or t3.col2 = t2.id
    group by
      t1.ID,t2.ID
)
select
    *
from
    NamedExpression ne
        inner join
    NamedExpression ne2
        on
            ne.ID2 = ne2.ID

It should also be noted that if you do the above as subqueries, and the expressions are particularly complex, it can sometimes take time for the reader/maintainer to verify that the two subqueries are in fact identical, and there's not some subtle difference between the two


Also, if you have an indenting style that says that subqueries should appear further to the right than their enclosing query, then expressions that build on other expressions can cause all of the code to shift to the right - whereas with CTEs, you stop and move back to the left in building each subexpression (CTE):

;WITH CTE1 AS (
    SELECT
    ...
), CTE2 as (
    SELECT
    ...
    FROM CTE1
), CTE3 as (
    SELECT
    ...
    FROM CTE2
)
select * from CTE3

vs:

select *
from
   (
        select ...
        from
             (
                 select ...
                 from
                     (
                          select ...
like image 94
Damien_The_Unbeliever Avatar answered Oct 03 '22 17:10

Damien_The_Unbeliever


I personally find the CTE version to be more readable especially if the select gets bigger.

When you use the derived table more than once in the main SELECT it might be better to use the CTE because it tells the database that you want to run this only once. Although I wouldn't be surprised if the optimizer was smart enough to detect two identical sub-selects in the from clause and only runs them once:

with foo as (
   select ..
   from bar
)
select f1.*
from foo f1 
  join foo f2 on ...

vs.

select f1.*
from (select ... from bar ) f1
  join (select ... from bar) f2 on ...

I think the most important part is to be consistent (across what you write and inside your team).

like image 40
a_horse_with_no_name Avatar answered Oct 03 '22 17:10

a_horse_with_no_name


I noticed that JOINs (esp. when combined with lot of WHERE clauses) can have disastrous performance when large data sets are involved.

CTEs can resolve this by selecting only relevant records and joining over these subsets.

Consider a CTE as a sort of pre-select to prepare data for the final SELECT.

like image 41
devio Avatar answered Oct 03 '22 15:10

devio