I could write a query using an aggregate function in two ways:
select team, count(min) as min_count
from table
group by team
having count(min) > 500
or
select *
from (
select team, count(min) as min_count
from table
group by team
) as A
where A.min_count > 500
Are there any performance benefits to either approach or are they functionally the same thing?
The main difference between with clause and a subquery in Oracle is that you can reference a query within the clause multiple times. You can then do some optimizations with it like turning it into a temp table using materialize hint. You can also do recursive queries with it by referencing itself inside a with clause.
Subqueries in a HAVING clauseYou may place a subquery in HAVING clause in an outer query. This allows you to filter groups of rows based on the result returned by your subquery.
CTE can be more readable: Another advantage of CTE is CTE are more readable than Subqueries. Since CTE can be reusable, you can write less code using CTE than using subquery. Also, people tend to follow the logic and ideas easier in sequence than in a nested fashion.
A HAVING clause is like a WHERE clause, but applies only to groups as a whole (that is, to the rows in the result set representing groups), whereas the WHERE clause applies to individual rows. A query can contain both a WHERE clause and a HAVING clause.
The two versions are functionally the same. Well, the second is syntactically incorrect, but I assume you mean:
select *
from (
select team, count(min) as count
from table
group by team
) t
where count > 500
(You need the alias on the calculation and several leading databases require an alias on a subquery in a FROM
clause.)
Being functionally equivalent does not mean that they are necessarily optimized the same way. There are often multiple ways to write a query that are functionally equivalent. However, the specific database engine/optimizer can choose (and often does choose) different optimization paths.
In this case, the query is so simple that it is hard to think of multiple optimization paths. For both versions, the engine basically has to aggregate teh query and then test the second column for the filter. I personally cannot see many variations on this theme. Any decent SQL engine should use indexes, if appropriate, in either both cases or neither.
So, the anwer to this specific question is that in any reasonable database, these should result in the same execution plan (i.e., in the use of indexes, the user of parallelism, and the choice of aggregation algorithm). However, being functionally equivalent does not mean that a given database engine is going to produce the same exeuction plan. So, the general answer is "no".
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With