SQL Performance: SELECT DISTINCT versus GROUP BY

Tags:

I have been trying to improve query times for an existing Oracle database-driven application that has been running a little sluggish. The application executes several large queries, such as the one below, which can take over an hour to run. Replacing the DISTINCT with a GROUP BY clause in the query below shrank execution time from 100 minutes to 10 seconds. My understanding was that SELECT DISTINCT and GROUP BY operated in pretty much the same way. Why such a huge disparity between execution times? What is the difference in how the query is executed on the back-end? Is there ever a situation where SELECT DISTINCT runs faster?

Note: In the following query, WHERE TASK_INVENTORY_STEP.STEP_TYPE = 'TYPE A' represents just one of a number of ways that results can be filtered. This example was provided to show the reasoning for joining all of the tables that do not have columns included in the SELECT and would result in about a tenth of all available data

SQL using DISTINCT:

SELECT DISTINCT      ITEMS.ITEM_ID,     ITEMS.ITEM_CODE,     ITEMS.ITEMTYPE,     ITEM_TRANSACTIONS.STATUS,     (SELECT COUNT(PKID)          FROM ITEM_PARENTS          WHERE PARENT_ITEM_ID = ITEMS.ITEM_ID         ) AS CHILD_COUNT FROM     ITEMS     INNER JOIN ITEM_TRANSACTIONS          ON ITEMS.ITEM_ID = ITEM_TRANSACTIONS.ITEM_ID          AND ITEM_TRANSACTIONS.FLAG = 1     LEFT OUTER JOIN ITEM_METADATA          ON ITEMS.ITEM_ID = ITEM_METADATA.ITEM_ID     LEFT OUTER JOIN JOB_INVENTORY          ON ITEMS.ITEM_ID = JOB_INVENTORY.ITEM_ID          LEFT OUTER JOIN JOB_TASK_INVENTORY          ON JOB_INVENTORY.JOB_ITEM_ID = JOB_TASK_INVENTORY.JOB_ITEM_ID     LEFT OUTER JOIN JOB_TASKS          ON JOB_TASK_INVENTORY.TASKID = JOB_TASKS.TASKID                                   LEFT OUTER JOIN JOBS          ON JOB_TASKS.JOB_ID = JOBS.JOB_ID     LEFT OUTER JOIN TASK_INVENTORY_STEP          ON JOB_INVENTORY.JOB_ITEM_ID = TASK_INVENTORY_STEP.JOB_ITEM_ID      LEFT OUTER JOIN TASK_STEP_INFORMATION          ON TASK_INVENTORY_STEP.JOB_ITEM_ID = TASK_STEP_INFORMATION.JOB_ITEM_ID WHERE      TASK_INVENTORY_STEP.STEP_TYPE = 'TYPE A' ORDER BY      ITEMS.ITEM_CODE

SQL using GROUP BY:

SELECT     ITEMS.ITEM_ID,     ITEMS.ITEM_CODE,     ITEMS.ITEMTYPE,     ITEM_TRANSACTIONS.STATUS,     (SELECT COUNT(PKID)          FROM ITEM_PARENTS          WHERE PARENT_ITEM_ID = ITEMS.ITEM_ID         ) AS CHILD_COUNT FROM     ITEMS     INNER JOIN ITEM_TRANSACTIONS          ON ITEMS.ITEM_ID = ITEM_TRANSACTIONS.ITEM_ID          AND ITEM_TRANSACTIONS.FLAG = 1     LEFT OUTER JOIN ITEM_METADATA          ON ITEMS.ITEM_ID = ITEM_METADATA.ITEM_ID     LEFT OUTER JOIN JOB_INVENTORY          ON ITEMS.ITEM_ID = JOB_INVENTORY.ITEM_ID          LEFT OUTER JOIN JOB_TASK_INVENTORY          ON JOB_INVENTORY.JOB_ITEM_ID = JOB_TASK_INVENTORY.JOB_ITEM_ID     LEFT OUTER JOIN JOB_TASKS          ON JOB_TASK_INVENTORY.TASKID = JOB_TASKS.TASKID                                   LEFT OUTER JOIN JOBS          ON JOB_TASKS.JOB_ID = JOBS.JOB_ID     LEFT OUTER JOIN TASK_INVENTORY_STEP          ON JOB_INVENTORY.JOB_ITEM_ID = TASK_INVENTORY_STEP.JOB_ITEM_ID      LEFT OUTER JOIN TASK_STEP_INFORMATION          ON TASK_INVENTORY_STEP.JOB_ITEM_ID = TASK_STEP_INFORMATION.JOB_ITEM_ID WHERE      TASK_INVENTORY_STEP.STEP_TYPE = 'TYPE A' GROUP BY     ITEMS.ITEM_ID,     ITEMS.ITEM_CODE,     ITEMS.ITEMTYPE,     ITEM_TRANSACTIONS.STATUS ORDER BY      ITEMS.ITEM_CODE

Here is the Oracle query plan for the query using DISTINCT:

Oracle query plan for query using DISTINCT

Here is the Oracle query plan for the query using GROUP BY:

Oracle query plan for query using GROUP BY

287

asked Dec 19 '12 16:12

woemler

1 Answers

The performance difference is probably due to the execution of the subquery in the SELECT clause. I am guessing that it is re-executing this query for every row before the distinct. For the group by, it would execute once after the group by.

Try replacing it with a join, instead:

select . . .,        parentcnt from . . . left outer join       (SELECT PARENT_ITEM_ID, COUNT(PKID) as parentcnt        FROM ITEM_PARENTS        ) p       on items.item_id = p.parent_item_id

answered Oct 13 '22 16:10

Gordon Linoff

Related questions
                            
                                Html5 number input step and precision
                            
                                Python why would you use [:] over =
                            
                                CakePHP Session Timeout on Inactivity only
                            
                                How to move the Android Google Maps API Compass Position
                            
                                scrapy- how to stop Redirect (302)
                            
                                How to write Reads[T] and Writes[T] in scala Enumeration (play framework 2.1)
                            
                                Mockito UnfinishedStubbingException
                            
                                How to detect double precision floating point overflow and underflow?
                            
                                Coerce to number
                            
                                Do Local Notifications need user permission on iOS?
                            
                                Use of enable_shared_from_this with multiple inheritance
                            
                                uWSGI Fails with No module named encoding Error

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With