I came across two queries which seems to have the same result: applying aggregate function on partition.
I am wondering if there is any difference between these two queries:
SELECT empno, deptno, sal, MIN(sal) OVER (PARTITION BY deptno) "Lowest", MAX(sal) OVER (PARTITION BY deptno) "Highest" FROM empl SELECT empno, deptno, sal, MIN(sal) KEEP (DENSE_RANK FIRST ORDER BY sal) OVER (PARTITION BY deptno) "Lowest", MAX(sal) KEEP (DENSE_RANK LAST ORDER BY sal) OVER (PARTITION BY deptno) "Highest" FROM empl
The first version is more logical but second one may be some kind special case, maybe some performance optimization.
The PARTITION BY RANGE clause of the CREATE TABLE statement specifies that the table or index is to be range-partitioned. The PARTITION clauses identify the individual partition ranges, and the optional subclauses of a PARTITION clause can specify physical and other attributes specific to a partition segment.
KEEP (DENSE_RANK LAST means give a (consecutive) ranking to those ordered rows within each group (rows with identical values for the ordering columns will be given the same rank) and KEEP only those rows that are LAST in the ranking (i.e. the rows with the maximum startdate ); and finally. MAX(ra.
A GROUP BY normally reduces the number of rows returned by rolling them up and calculating averages or sums for each row. PARTITION BY does not affect the number of rows returned, but it changes how a window function's result is calculated.
However, in most circumstances the PARTITION BY version takes about twice as long as the GROUP BY version.
MIN(sal) KEEP (DENSE_RANK FIRST ORDER BY sal) OVER (PARTITION BY deptno)
The statement can be considered in (roughly) right-to-left order:
OVER (PARTITION BY deptno)
means partition the rows into distinct groups of deptno
; thenORDER BY sal
means, for each partition, order the rows by sal
(implicitly using ASC
ending order); thenKEEP (DENSE_RANK FIRST
means give a (consecutive) ranking to the ordered rows for each partition (rows with identical values for the ordering columns will be given the same rank) and discard all rows which are not ranked first; and finallyMIN(sal)
for the remaining rows of each partition, return the minimum salary.In this case the MIN
and DENSE_RANK FIRST
are both operating on the sal
column so will do the same thing and the KEEP (DENSE_RANK FIRST ORDER BY sal)
is redundant.
However if you use a different column for the minimum then you can see the effects:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE test (name, sal, deptno) AS SELECT 'a', 1, 1 FROM DUAL UNION ALL SELECT 'b', 1, 1 FROM DUAL UNION ALL SELECT 'c', 1, 1 FROM DUAL UNION ALL SELECT 'd', 2, 1 FROM DUAL UNION ALL SELECT 'e', 3, 1 FROM DUAL UNION ALL SELECT 'f', 3, 1 FROM DUAL UNION ALL SELECT 'g', 4, 2 FROM DUAL UNION ALL SELECT 'h', 4, 2 FROM DUAL UNION ALL SELECT 'i', 5, 2 FROM DUAL UNION ALL SELECT 'j', 5, 2 FROM DUAL;
Query 1:
SELECT DISTINCT MIN(sal) KEEP (DENSE_RANK FIRST ORDER BY sal) OVER (PARTITION BY deptno) AS min_sal_first_sal, MAX(sal) KEEP (DENSE_RANK FIRST ORDER BY sal) OVER (PARTITION BY deptno) AS max_sal_first_sal, MIN(name) KEEP (DENSE_RANK FIRST ORDER BY sal) OVER (PARTITION BY deptno) AS min_name_first_sal, MAX(name) KEEP (DENSE_RANK FIRST ORDER BY sal) OVER (PARTITION BY deptno) AS max_name_first_sal, MIN(name) KEEP (DENSE_RANK LAST ORDER BY sal) OVER (PARTITION BY deptno) AS min_name_last_sal, MAX(name) KEEP (DENSE_RANK LAST ORDER BY sal) OVER (PARTITION BY deptno) AS max_name_last_sal, deptno FROM test
Results:
| MIN_SAL_FIRST_SAL | MAX_SAL_FIRST_SAL | MIN_NAME_FIRST_SAL | MAX_NAME_FIRST_SAL | MIN_NAME_LAST_SAL | MAX_NAME_LAST_SAL | DEPTNO | |-------------------|-------------------|--------------------|--------------------|-------------------|-------------------|--------| | 1 | 1 | a | c | e | f | 1 | | 4 | 4 | g | h | i | j | 2 |
In your example, there's no difference, because your aggregate is on the same column that you are sorting on. The real point/power of "KEEP" is when you aggregate and sort on different columns. For example (borrowing the "test" table from the other answer)...
SELECT deptno, min(name) keep ( dense_rank first order by sal desc, name ) , max(sal) FROM test group by deptno
;
This query gets the name of person with the highest salary in each department. Consider the alternative without a "KEEP" clause:
SELECT deptno, name, sal FROM test t WHERE not exists ( SELECT 'person with higher salary in same department' FROM test t2 WHERE t2.deptno = t.deptno and (( t2.sal > t.sal ) OR ( t2.sal = t.sal AND t2.name < t.name ) ) )
The KEEP clause is easier and more efficient (only 3 consistent gets vs 34 gets for the alternative, in this simple example).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With