Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Performance: SELECT DISTINCT versus GROUP BY

Tags:

I have been trying to improve query times for an existing Oracle database-driven application that has been running a little sluggish. The application executes several large queries, such as the one below, which can take over an hour to run. Replacing the DISTINCT with a GROUP BY clause in the query below shrank execution time from 100 minutes to 10 seconds. My understanding was that SELECT DISTINCT and GROUP BY operated in pretty much the same way. Why such a huge disparity between execution times? What is the difference in how the query is executed on the back-end? Is there ever a situation where SELECT DISTINCT runs faster?

Note: In the following query, WHERE TASK_INVENTORY_STEP.STEP_TYPE = 'TYPE A' represents just one of a number of ways that results can be filtered. This example was provided to show the reasoning for joining all of the tables that do not have columns included in the SELECT and would result in about a tenth of all available data

SQL using DISTINCT:

SELECT DISTINCT      ITEMS.ITEM_ID,     ITEMS.ITEM_CODE,     ITEMS.ITEMTYPE,     ITEM_TRANSACTIONS.STATUS,     (SELECT COUNT(PKID)          FROM ITEM_PARENTS          WHERE PARENT_ITEM_ID = ITEMS.ITEM_ID         ) AS CHILD_COUNT FROM     ITEMS     INNER JOIN ITEM_TRANSACTIONS          ON ITEMS.ITEM_ID = ITEM_TRANSACTIONS.ITEM_ID          AND ITEM_TRANSACTIONS.FLAG = 1     LEFT OUTER JOIN ITEM_METADATA          ON ITEMS.ITEM_ID = ITEM_METADATA.ITEM_ID     LEFT OUTER JOIN JOB_INVENTORY          ON ITEMS.ITEM_ID = JOB_INVENTORY.ITEM_ID          LEFT OUTER JOIN JOB_TASK_INVENTORY          ON JOB_INVENTORY.JOB_ITEM_ID = JOB_TASK_INVENTORY.JOB_ITEM_ID     LEFT OUTER JOIN JOB_TASKS          ON JOB_TASK_INVENTORY.TASKID = JOB_TASKS.TASKID                                   LEFT OUTER JOIN JOBS          ON JOB_TASKS.JOB_ID = JOBS.JOB_ID     LEFT OUTER JOIN TASK_INVENTORY_STEP          ON JOB_INVENTORY.JOB_ITEM_ID = TASK_INVENTORY_STEP.JOB_ITEM_ID      LEFT OUTER JOIN TASK_STEP_INFORMATION          ON TASK_INVENTORY_STEP.JOB_ITEM_ID = TASK_STEP_INFORMATION.JOB_ITEM_ID WHERE      TASK_INVENTORY_STEP.STEP_TYPE = 'TYPE A' ORDER BY      ITEMS.ITEM_CODE 

SQL using GROUP BY:

SELECT     ITEMS.ITEM_ID,     ITEMS.ITEM_CODE,     ITEMS.ITEMTYPE,     ITEM_TRANSACTIONS.STATUS,     (SELECT COUNT(PKID)          FROM ITEM_PARENTS          WHERE PARENT_ITEM_ID = ITEMS.ITEM_ID         ) AS CHILD_COUNT FROM     ITEMS     INNER JOIN ITEM_TRANSACTIONS          ON ITEMS.ITEM_ID = ITEM_TRANSACTIONS.ITEM_ID          AND ITEM_TRANSACTIONS.FLAG = 1     LEFT OUTER JOIN ITEM_METADATA          ON ITEMS.ITEM_ID = ITEM_METADATA.ITEM_ID     LEFT OUTER JOIN JOB_INVENTORY          ON ITEMS.ITEM_ID = JOB_INVENTORY.ITEM_ID          LEFT OUTER JOIN JOB_TASK_INVENTORY          ON JOB_INVENTORY.JOB_ITEM_ID = JOB_TASK_INVENTORY.JOB_ITEM_ID     LEFT OUTER JOIN JOB_TASKS          ON JOB_TASK_INVENTORY.TASKID = JOB_TASKS.TASKID                                   LEFT OUTER JOIN JOBS          ON JOB_TASKS.JOB_ID = JOBS.JOB_ID     LEFT OUTER JOIN TASK_INVENTORY_STEP          ON JOB_INVENTORY.JOB_ITEM_ID = TASK_INVENTORY_STEP.JOB_ITEM_ID      LEFT OUTER JOIN TASK_STEP_INFORMATION          ON TASK_INVENTORY_STEP.JOB_ITEM_ID = TASK_STEP_INFORMATION.JOB_ITEM_ID WHERE      TASK_INVENTORY_STEP.STEP_TYPE = 'TYPE A' GROUP BY     ITEMS.ITEM_ID,     ITEMS.ITEM_CODE,     ITEMS.ITEMTYPE,     ITEM_TRANSACTIONS.STATUS ORDER BY      ITEMS.ITEM_CODE 

Here is the Oracle query plan for the query using DISTINCT:

Oracle query plan for query using DISTINCT

Here is the Oracle query plan for the query using GROUP BY:

Oracle query plan for query using GROUP BY

like image 287
woemler Avatar asked Dec 19 '12 16:12

woemler


People also ask

Which is faster select distinct or GROUP BY?

DISTINCT is used to filter unique records out of all records in the table. It removes the duplicate rows. SELECT DISTINCT will always be the same, or faster than a GROUP BY.

Is distinct or GROUP BY more efficient?

In summary: GROUP BY is slightly faster than SELECT DISTINCT.

Why is GROUP BY faster than distinct?

DISTINCT would usually be faster than GROUP BY if a) there's no index on that column and b) you are not ordering as well since GROUP BY does both filtering and ordering.

Should we use distinct or GROUP BY?

When and where to use GROUP BY and DISTINCT. DISTINCT is used to filter unique records out of the records that satisfy the query criteria. The "GROUP BY" clause is used when you need to group the data and it should be used to apply aggregate operators to each group.


1 Answers

The performance difference is probably due to the execution of the subquery in the SELECT clause. I am guessing that it is re-executing this query for every row before the distinct. For the group by, it would execute once after the group by.

Try replacing it with a join, instead:

select . . .,        parentcnt from . . . left outer join       (SELECT PARENT_ITEM_ID, COUNT(PKID) as parentcnt        FROM ITEM_PARENTS        ) p       on items.item_id = p.parent_item_id 
like image 77
Gordon Linoff Avatar answered Oct 13 '22 16:10

Gordon Linoff