I have received a SQL query that makes use of the distinct
keyword. When I tried running the query it took at least a minute to join two tables with hundreds of thousands of records and actually return something.
I then took out the distinction and it came back in 0.2 seconds. Does the distinct keyword really make things that bad?
Here's the query:
SELECT DISTINCT
c.username, o.orderno, o.totalcredits, o.totalrefunds,
o.recstatus, o.reason
FROM management.contacts c
JOIN management.orders o ON (c.custID = o.custID)
WHERE o.recDate > to_date('2010-01-01', 'YYYY/MM/DD')
Yes, basically it has to sort the results and then re-processed to eliminate the duplicates. This cull could also be being done during the sort, but we can only speculate as to how exactly the code works in the background. You could try and improve the performance by creating an index composed of all three (3) fields.
Very few queries may perform faster in SELECT DISTINCT mode, and very few will perform slower (but not significantly slower) in SELECT DISTINCT mode but for the later case it is likely that the application may need to examine the duplicate cases, which shifts the performance and complexity burden to the application.
As a general rule, SELECT DISTINCT incurs a fair amount of overhead for the query. Hence, you should avoid it or use it sparingly. The idea of generating duplicate rows using JOIN just to remove them with SELECT DISTINCT is rather reminiscent of Sisyphus pushing a rock up a hill, only to have it roll back down again.
Yes, as using DISTINCT
will (sometimes according to a comment) cause results to be ordered. Sorting hundreds of records takes time.
Try GROUP BY
all your columns, it can sometimes lead the query optimiser to choose a more efficient algorithm (at least with Oracle I noticed significant performance gain).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With