Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PARTITION BY with and without KEEP in Oracle

Tags:

I came across two queries which seems to have the same result: applying aggregate function on partition.

I am wondering if there is any difference between these two queries:

SELECT empno,    deptno,    sal,    MIN(sal) OVER (PARTITION BY deptno) "Lowest",    MAX(sal) OVER (PARTITION BY deptno) "Highest" FROM empl  SELECT empno,    deptno,    sal,    MIN(sal) KEEP (DENSE_RANK FIRST ORDER BY sal) OVER (PARTITION BY deptno) "Lowest",    MAX(sal) KEEP (DENSE_RANK LAST ORDER BY sal) OVER (PARTITION BY deptno) "Highest" FROM empl 

The first version is more logical but second one may be some kind special case, maybe some performance optimization.

like image 481
aurelije Avatar asked Nov 22 '13 13:11

aurelije


People also ask

What is the use of partition by in oracle?

The PARTITION BY RANGE clause of the CREATE TABLE statement specifies that the table or index is to be range-partitioned. The PARTITION clauses identify the individual partition ranges, and the optional subclauses of a PARTITION clause can specify physical and other attributes specific to a partition segment.

What is keep in Oracle?

KEEP (DENSE_RANK LAST means give a (consecutive) ranking to those ordered rows within each group (rows with identical values for the ordering columns will be given the same rank) and KEEP only those rows that are LAST in the ranking (i.e. the rows with the maximum startdate ); and finally. MAX(ra.

Which is better GROUP BY or partition by?

A GROUP BY normally reduces the number of rows returned by rolling them up and calculating averages or sums for each row. PARTITION BY does not affect the number of rows returned, but it changes how a window function's result is calculated.

Is GROUP BY faster than partition by?

However, in most circumstances the PARTITION BY version takes about twice as long as the GROUP BY version.


2 Answers

MIN(sal) KEEP (DENSE_RANK FIRST ORDER BY sal) OVER (PARTITION BY deptno) 

The statement can be considered in (roughly) right-to-left order:

  • OVER (PARTITION BY deptno) means partition the rows into distinct groups of deptno; then
  • ORDER BY sal means, for each partition, order the rows by sal (implicitly using ASCending order); then
  • KEEP (DENSE_RANK FIRST means give a (consecutive) ranking to the ordered rows for each partition (rows with identical values for the ordering columns will be given the same rank) and discard all rows which are not ranked first; and finally
  • MIN(sal) for the remaining rows of each partition, return the minimum salary.

In this case the MIN and DENSE_RANK FIRST are both operating on the sal column so will do the same thing and the KEEP (DENSE_RANK FIRST ORDER BY sal) is redundant.

However if you use a different column for the minimum then you can see the effects:

SQL Fiddle

Oracle 11g R2 Schema Setup:

CREATE TABLE test (name, sal, deptno) AS SELECT 'a', 1, 1 FROM DUAL UNION ALL SELECT 'b', 1, 1 FROM DUAL UNION ALL SELECT 'c', 1, 1 FROM DUAL UNION ALL SELECT 'd', 2, 1 FROM DUAL UNION ALL SELECT 'e', 3, 1 FROM DUAL UNION ALL SELECT 'f', 3, 1 FROM DUAL UNION ALL SELECT 'g', 4, 2 FROM DUAL UNION ALL SELECT 'h', 4, 2 FROM DUAL UNION ALL SELECT 'i', 5, 2 FROM DUAL UNION ALL SELECT 'j', 5, 2 FROM DUAL; 

Query 1:

SELECT DISTINCT   MIN(sal) KEEP (DENSE_RANK FIRST ORDER BY sal) OVER (PARTITION BY deptno) AS min_sal_first_sal,   MAX(sal) KEEP (DENSE_RANK FIRST ORDER BY sal) OVER (PARTITION BY deptno) AS max_sal_first_sal,   MIN(name) KEEP (DENSE_RANK FIRST ORDER BY sal) OVER (PARTITION BY deptno) AS min_name_first_sal,   MAX(name) KEEP (DENSE_RANK FIRST ORDER BY sal) OVER (PARTITION BY deptno) AS max_name_first_sal,   MIN(name) KEEP (DENSE_RANK LAST ORDER BY sal) OVER (PARTITION BY deptno) AS min_name_last_sal,   MAX(name) KEEP (DENSE_RANK LAST ORDER BY sal) OVER (PARTITION BY deptno) AS max_name_last_sal,   deptno FROM test 

Results:

| MIN_SAL_FIRST_SAL | MAX_SAL_FIRST_SAL | MIN_NAME_FIRST_SAL | MAX_NAME_FIRST_SAL | MIN_NAME_LAST_SAL | MAX_NAME_LAST_SAL | DEPTNO | |-------------------|-------------------|--------------------|--------------------|-------------------|-------------------|--------| |                 1 |                 1 |                  a |                  c |                 e |                 f |      1 | |                 4 |                 4 |                  g |                  h |                 i |                 j |      2 | 
like image 75
MT0 Avatar answered Nov 02 '22 20:11

MT0


In your example, there's no difference, because your aggregate is on the same column that you are sorting on. The real point/power of "KEEP" is when you aggregate and sort on different columns. For example (borrowing the "test" table from the other answer)...

SELECT deptno,  min(name) keep ( dense_rank first order by sal desc, name  ) , max(sal) FROM test group by deptno 

;

This query gets the name of person with the highest salary in each department. Consider the alternative without a "KEEP" clause:

SELECT deptno, name, sal FROM test t WHERE not exists ( SELECT 'person with higher salary in same department'                                             FROM test t2                                               WHERE t2.deptno = t.deptno                                             and ((  t2.sal > t.sal )                                             OR ( t2.sal = t.sal AND t2.name < t.name ) ) ) 

The KEEP clause is easier and more efficient (only 3 consistent gets vs 34 gets for the alternative, in this simple example).

like image 31
Matthew McPeak Avatar answered Nov 02 '22 19:11

Matthew McPeak