Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

oracle distinct doing sort [closed]

Tags:

sql

oracle

I'm getting poor performance from DISTINCT. The explain plan indicates that it is doing SORT (GROUP BY) which doesn't sound right. I would expect some kind of HASH aggregation to produce much better result. Is there a hint to tell oracle to use HASH for DISTINCT rather than sort? I've used /*+ USE_HASH_AGGREGATION */ in similar situations, but it is not working for DISTINCT.

So this is my original query:

SELECT
count(distinct userid) n, col
FROM users
GROUP BY col;

users has 30M rows, each userid is there 12 times. This query takes 70 seconds.

Now we rewrite it as

SELECT
count(userid) n, col
FROM
(SELECT distinct userid, col FROM users)
GROUP BY col

And it takes 40 seconds. Now add the hint to do hash instead of sort:

SELECT
count(userid) n, col
FROM
(SELECT /*+ USE_HASH_AGGREGATION */ distinct userid, col FROM users)
GROUP BY col

and it takes 10 seconds.

If somebody can explain to me why this is happening or how I can beat the first simple query into working as good as the 3rd one, that would be fantastic.
The reason I care about query simplicity is because these queries are actually generated.

Plans: 1) Slow:

----------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation      | Name          | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem | Used-Tmp|
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |               |      1 |        |      5 |00:01:12.01 |     283K|    292K|       |       |      |     |
|   1 |  SORT GROUP BY     |               |      1 |      5 |      5 |00:01:12.01 |     283K|    292K|   194M|   448K|  172M (0)|   73728 |
|   2 |   TABLE ACCESS FULL| USERS |      1 |     29M|     29M|00:00:08.17 |     283K|    283K|       |       |      |     |

2) Fast

--------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation        | Name          | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |               |      1 |        |      5 |00:00:13.09 |     283K|    283K|   |   |      |
|   1 |  SORT GROUP BY       |               |      1 |      5 |      5 |00:00:13.09 |     283K|    283K|  3072 |  3072 | 2048  (0)|
|   2 |   VIEW               |               |      1 |   8647K|   2445K|00:00:13.16 |     283K|    283K|   |   |      |
|   3 |    HASH UNIQUE       |               |      1 |   8647K|   2445K|00:00:12.57 |     283K|    283K|   113M|    10M|  216M (0)|
|   4 |     TABLE ACCESS FULL| USERS         |      1 |     29M|     29M|00:00:07.68 |     283K|    283K|   |   |      |
--------------------------------------------------------------------------------------------------------------------------------------------
like image 859
MK. Avatar asked Feb 14 '12 16:02

MK.


People also ask

Does distinct SQL sort?

No. There are a number of circumstances in which a DISTINCT in Oracle does not imply a sort, the most important of which is the hashing algorithm used in 10g+ for both group by and distinct operations. Always specify ORDER BY if you want an ordered result set, even in 9i and below.

Can you use distinct and ORDER BY together?

Without a transformation, a statement that contains both DISTINCT and ORDER BY would require two separate sorting steps-one to satisfy DISTINCT and one to satisfy ORDER BY. (Currently, Derby uses sorting to evaluate DISTINCT.

Which is faster distinct or group by in Oracle?

I personally wouldn't consider performance, but what is semantically correct. DISTINCT implies you want a distinct set of columns. However, GROUP BY implies you want to compute some sort of aggregate value which you are not.

What does distinct mean in Oracle?

The DISTINCT clause is used in a SELECT statement to filter duplicate rows in the result set. It ensures that rows returned are unique for the column or columns specified in the SELECT clause.


1 Answers

How about trying the following: If you had an index on col and userid it should resolve completely in the index and not need to touch the table at all.

Select count(userid) n, col
from (select col, userid from users group by col, userid)
group by col
;
like image 149
Roger Cornejo Avatar answered Oct 19 '22 10:10

Roger Cornejo