Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Listagg Partition instead of Group By

Which is more efficient of the 2, specifically in the case when using Listagg with varchar type data?

SELECT department_id AS "Dept.",
       LISTAGG(last_name, '; ') WITHIN GROUP (ORDER BY hire_date) AS "Employees"
  FROM employees
  GROUP BY department_id
  ORDER BY department_id;

OR

SELECT department_id AS "Dept.",
       LISTAGG(last_name, '; ') WITHIN GROUP (ORDER BY hire_date) 
       OVER (PARTITION BY department_id) AS "Employees"
  FROM employees
  ORDER BY department_id;

For me, unless I'm only selecting 1/2 columns, I'll always use partition as I don't have to include each column in the 'GROUP BY' clause.

like image 730
AJQShake Avatar asked Oct 26 '25 09:10

AJQShake


2 Answers

The two queries give different outputs. Using GROUP BY will return one row-per-group whilst using OVER ( PARTITION BY .. ) will return all the rows and duplicate the LISTAGG result for each row in the partition.

Use whichever solution is more appropriate for your desired output - but they are not equivalent.

SQL Fiddle

Oracle 11g R2 Schema Setup:

CREATE TABLE employees ( department_id, last_name, hire_date ) AS
SELECT 1, 'AAA', DATE '2018-01-01' FROM DUAL UNION ALL
SELECT 1, 'BBB', DATE '2018-01-02' FROM DUAL UNION ALL
SELECT 1, 'CCC', DATE '2018-01-03' FROM DUAL UNION ALL
SELECT 2, 'DDD', DATE '2018-01-01' FROM DUAL UNION ALL
SELECT 2, 'EEE', DATE '2018-01-02' FROM DUAL;

Query 1:

SELECT department_id AS "Dept.",
       LISTAGG(last_name, '; ') WITHIN GROUP (ORDER BY hire_date) AS "Employees"
  FROM employees
  GROUP BY department_id
  ORDER BY department_id

Results:

| Dept. |     Employees |
|-------|---------------|
|     1 | AAA; BBB; CCC |
|     2 |      DDD; EEE |

Query 2:

SELECT department_id AS "Dept.",
       LISTAGG(last_name, '; ') WITHIN GROUP (ORDER BY hire_date) 
       OVER (PARTITION BY department_id) AS "Employees"
  FROM employees
  ORDER BY department_id

Results:

| Dept. |     Employees |
|-------|---------------|
|     1 | AAA; BBB; CCC |
|     1 | AAA; BBB; CCC |
|     1 | AAA; BBB; CCC |
|     2 |      DDD; EEE |
|     2 |      DDD; EEE |
like image 79
MT0 Avatar answered Oct 29 '25 00:10

MT0


Both queries do the same amount of work, as each one has to perform a full scan of EMPLOYEES sorting the values of LAST_NAME within DEPARTMENT_ID.

Using the Oracle 'HR' demo schema (where EMPLOYEES has 107 rows) and the SQL*Plus set autotrace tool, we can see both perform 7 consistent gets and 1 sort.

SQL> set autotrace traceonly explain statistics

SQL> select department_id as "Dept."
  2       , listagg(last_name, '; ') within group(order by hire_date) as "Employees"
  3  from   employees
  4  group  by department_id
  5  order  by department_id;

12 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2107619104

--------------------------------------------------------------------------------
| Id  | Operation          | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |    11 |   209 |     4  (25)| 00:00:01 |
|   1 |  SORT GROUP BY     |           |    11 |   209 |     4  (25)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| EMPLOYEES |   107 |  2033 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          7  consistent gets
          0  physical reads
          0  redo size
       1605  bytes sent via SQL*Net to client
        608  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
         12  rows processed

SQL> select department_id as "Dept."
  2       , listagg(last_name, '; ') within group(order by hire_date) over(partition by department_id) as "Employees"
  3  from   employees
  4  order  by department_id;

107 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1919783947

--------------------------------------------------------------------------------
| Id  | Operation          | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |   107 |  2033 |     3   (0)| 00:00:01 |
|   1 |  WINDOW SORT       |           |   107 |  2033 |     3   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| EMPLOYEES |   107 |  2033 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          7  consistent gets
          0  physical reads
          0  redo size
       3703  bytes sent via SQL*Net to client
        685  bytes received via SQL*Net from client
          9  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
        107  rows processed

There is more network activity for the second query as it returns more rows.

like image 38
William Robertson Avatar answered Oct 29 '25 00:10

William Robertson



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!