Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Distinct keyword bogs down performance?

Tags:

sql

plsql

I have received a SQL query that makes use of the distinct keyword. When I tried running the query it took at least a minute to join two tables with hundreds of thousands of records and actually return something.

I then took out the distinction and it came back in 0.2 seconds. Does the distinct keyword really make things that bad?

Here's the query:

SELECT DISTINCT
    c.username, o.orderno, o.totalcredits, o.totalrefunds,
    o.recstatus, o.reason 
FROM management.contacts c 
    JOIN management.orders o ON (c.custID = o.custID)
WHERE o.recDate > to_date('2010-01-01', 'YYYY/MM/DD')
like image 352
MxLDevs Avatar asked Mar 17 '11 15:03

MxLDevs


People also ask

Does distinct affect query performance?

Yes, basically it has to sort the results and then re-processed to eliminate the duplicates. This cull could also be being done during the sort, but we can only speculate as to how exactly the code works in the background. You could try and improve the performance by creating an index composed of all three (3) fields.

Does using distinct slow down query?

Very few queries may perform faster in SELECT DISTINCT mode, and very few will perform slower (but not significantly slower) in SELECT DISTINCT mode but for the later case it is likely that the application may need to examine the duplicate cases, which shifts the performance and complexity burden to the application.

Why you shouldn't use SELECT distinct?

As a general rule, SELECT DISTINCT incurs a fair amount of overhead for the query. Hence, you should avoid it or use it sparingly. The idea of generating duplicate rows using JOIN just to remove them with SELECT DISTINCT is rather reminiscent of Sisyphus pushing a rock up a hill, only to have it roll back down again.


1 Answers

Yes, as using DISTINCT will (sometimes according to a comment) cause results to be ordered. Sorting hundreds of records takes time.

Try GROUP BY all your columns, it can sometimes lead the query optimiser to choose a more efficient algorithm (at least with Oracle I noticed significant performance gain).

like image 57
Benoit Avatar answered Oct 15 '22 03:10

Benoit