Listagg Partition instead of Group By

Question

Which is more efficient of the 2, specifically in the case when using Listagg with varchar type data?

SELECT department_id AS "Dept.",
       LISTAGG(last_name, '; ') WITHIN GROUP (ORDER BY hire_date) AS "Employees"
  FROM employees
  GROUP BY department_id
  ORDER BY department_id;

OR

SELECT department_id AS "Dept.",
       LISTAGG(last_name, '; ') WITHIN GROUP (ORDER BY hire_date) 
       OVER (PARTITION BY department_id) AS "Employees"
  FROM employees
  ORDER BY department_id;

For me, unless I'm only selecting 1/2 columns, I'll always use partition as I don't have to include each column in the 'GROUP BY' clause.

MT0 · Accepted Answer

The two queries give different outputs. Using GROUP BY will return one row-per-group whilst using OVER ( PARTITION BY .. ) will return all the rows and duplicate the LISTAGG result for each row in the partition.

Use whichever solution is more appropriate for your desired output - but they are not equivalent.

SQL Fiddle

Oracle 11g R2 Schema Setup:

CREATE TABLE employees ( department_id, last_name, hire_date ) AS
SELECT 1, 'AAA', DATE '2018-01-01' FROM DUAL UNION ALL
SELECT 1, 'BBB', DATE '2018-01-02' FROM DUAL UNION ALL
SELECT 1, 'CCC', DATE '2018-01-03' FROM DUAL UNION ALL
SELECT 2, 'DDD', DATE '2018-01-01' FROM DUAL UNION ALL
SELECT 2, 'EEE', DATE '2018-01-02' FROM DUAL;

Query 1:

SELECT department_id AS "Dept.",
       LISTAGG(last_name, '; ') WITHIN GROUP (ORDER BY hire_date) AS "Employees"
  FROM employees
  GROUP BY department_id
  ORDER BY department_id

Results:

| Dept. |     Employees |
|-------|---------------|
|     1 | AAA; BBB; CCC |
|     2 |      DDD; EEE |

Query 2:

SELECT department_id AS "Dept.",
       LISTAGG(last_name, '; ') WITHIN GROUP (ORDER BY hire_date) 
       OVER (PARTITION BY department_id) AS "Employees"
  FROM employees
  ORDER BY department_id

Results:

| Dept. |     Employees |
|-------|---------------|
|     1 | AAA; BBB; CCC |
|     1 | AAA; BBB; CCC |
|     1 | AAA; BBB; CCC |
|     2 |      DDD; EEE |
|     2 |      DDD; EEE |

William Robertson · Answer

Both queries do the same amount of work, as each one has to perform a full scan of EMPLOYEES sorting the values of LAST_NAME within DEPARTMENT_ID.

Using the Oracle 'HR' demo schema (where EMPLOYEES has 107 rows) and the SQL*Plus set autotrace tool, we can see both perform 7 consistent gets and 1 sort.

SQL> set autotrace traceonly explain statistics

SQL> select department_id as "Dept."
  2       , listagg(last_name, '; ') within group(order by hire_date) as "Employees"
  3  from   employees
  4  group  by department_id
  5  order  by department_id;

12 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2107619104

--------------------------------------------------------------------------------
| Id  | Operation          | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |    11 |   209 |     4  (25)| 00:00:01 |
|   1 |  SORT GROUP BY     |           |    11 |   209 |     4  (25)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| EMPLOYEES |   107 |  2033 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          7  consistent gets
          0  physical reads
          0  redo size
       1605  bytes sent via SQL*Net to client
        608  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
         12  rows processed

SQL> select department_id as "Dept."
  2       , listagg(last_name, '; ') within group(order by hire_date) over(partition by department_id) as "Employees"
  3  from   employees
  4  order  by department_id;

107 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1919783947

--------------------------------------------------------------------------------
| Id  | Operation          | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |   107 |  2033 |     3   (0)| 00:00:01 |
|   1 |  WINDOW SORT       |           |   107 |  2033 |     3   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| EMPLOYEES |   107 |  2033 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          7  consistent gets
          0  physical reads
          0  redo size
       3703  bytes sent via SQL*Net to client
        685  bytes received via SQL*Net from client
          9  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
        107  rows processed

There is more network activity for the second query as it returns more rows.

Listagg Partition instead of Group By

Tags:

oracle-database

sql

AJQShake

2 Answers

MT0

William Robertson

Recent Activity

Donate For Us

Listagg Partition instead of Group By

Tags:

oracle-database

sql

AJQShake

2 Answers

MT0

William Robertson

Related questions

Recent Activity

Donate For Us