I'm getting poor performance from DISTINCT. The explain plan indicates that it is doing SORT (GROUP BY) which doesn't sound right. I would expect some kind of HASH aggregation to produce much better result. Is there a hint to tell oracle to use HASH for DISTINCT rather than sort? I've used /*+ USE_HASH_AGGREGATION */ in similar situations, but it is not working for DISTINCT. So this is my original query: <pre class="prettyprint"><code>SELECT count(distinct userid) n, col FROM users GROUP BY col; </code></pre> users has 30M rows, each userid is there 12 times. This query takes 70 seconds. Now we rewrite it as <pre class="prettyprint"><code>SELECT count(userid) n, col FROM (SELECT distinct userid, col FROM users) GROUP BY col </code></pre> And it takes 40 seconds. Now add the hint to do hash instead of sort: <pre class="prettyprint"><code>SELECT count(userid) n, col FROM (SELECT /*+ USE_HASH_AGGREGATION */ distinct userid, col FROM users) GROUP BY col </code></pre> and it takes 10 seconds. If somebody can explain to me why this is happening or how I can beat the first simple query into working as good as the 3rd one, that would be fantastic. The reason I care about query simplicity is because these queries are actually generated. Plans: 1) Slow: <pre class="prettyprint"><code>---------------------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem | Used-Tmp| -------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 5 |00:01:12.01 | 283K| 292K| | | | | | 1 | SORT GROUP BY | | 1 | 5 | 5 |00:01:12.01 | 283K| 292K| 194M| 448K| 172M (0)| 73728 | | 2 | TABLE ACCESS FULL| USERS | 1 | 29M| 29M|00:00:08.17 | 283K| 283K| | | | | </code></pre> 2) Fast <pre class="prettyprint"><code>-------------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem | -------------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 5 |00:00:13.09 | 283K| 283K| | | | | 1 | SORT GROUP BY | | 1 | 5 | 5 |00:00:13.09 | 283K| 283K| 3072 | 3072 | 2048 (0)| | 2 | VIEW | | 1 | 8647K| 2445K|00:00:13.16 | 283K| 283K| | | | | 3 | HASH UNIQUE | | 1 | 8647K| 2445K|00:00:12.57 | 283K| 283K| 113M| 10M| 216M (0)| | 4 | TABLE ACCESS FULL| USERS | 1 | 29M| 29M|00:00:07.68 | 283K| 283K| | | | -------------------------------------------------------------------------------------------------------------------------------------------- </code></pre>

How about trying the following: If you had an index on col and userid it should resolve completely in the index and not need to touch the table at all. <pre class="prettyprint"><code>Select count(userid) n, col from (select col, userid from users group by col, userid) group by col ; </code></pre>

oracle distinct doing sort [closed]

Tags:

sql

oracle

I'm getting poor performance from DISTINCT. The explain plan indicates that it is doing SORT (GROUP BY) which doesn't sound right. I would expect some kind of HASH aggregation to produce much better result. Is there a hint to tell oracle to use HASH for DISTINCT rather than sort? I've used /*+ USE_HASH_AGGREGATION */ in similar situations, but it is not working for DISTINCT.

So this is my original query:

SELECT
count(distinct userid) n, col
FROM users
GROUP BY col;

users has 30M rows, each userid is there 12 times. This query takes 70 seconds.

Now we rewrite it as

SELECT
count(userid) n, col
FROM
(SELECT distinct userid, col FROM users)
GROUP BY col

And it takes 40 seconds. Now add the hint to do hash instead of sort:

SELECT
count(userid) n, col
FROM
(SELECT /*+ USE_HASH_AGGREGATION */ distinct userid, col FROM users)
GROUP BY col

and it takes 10 seconds.

If somebody can explain to me why this is happening or how I can beat the first simple query into working as good as the 3rd one, that would be fantastic.
The reason I care about query simplicity is because these queries are actually generated.

Plans: 1) Slow:

----------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation      | Name          | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem | Used-Tmp|
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |               |      1 |        |      5 |00:01:12.01 |     283K|    292K|       |       |      |     |
|   1 |  SORT GROUP BY     |               |      1 |      5 |      5 |00:01:12.01 |     283K|    292K|   194M|   448K|  172M (0)|   73728 |
|   2 |   TABLE ACCESS FULL| USERS |      1 |     29M|     29M|00:00:08.17 |     283K|    283K|       |       |      |     |

2) Fast

--------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation        | Name          | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |               |      1 |        |      5 |00:00:13.09 |     283K|    283K|   |   |      |
|   1 |  SORT GROUP BY       |               |      1 |      5 |      5 |00:00:13.09 |     283K|    283K|  3072 |  3072 | 2048  (0)|
|   2 |   VIEW               |               |      1 |   8647K|   2445K|00:00:13.16 |     283K|    283K|   |   |      |
|   3 |    HASH UNIQUE       |               |      1 |   8647K|   2445K|00:00:12.57 |     283K|    283K|   113M|    10M|  216M (0)|
|   4 |     TABLE ACCESS FULL| USERS         |      1 |     29M|     29M|00:00:07.68 |     283K|    283K|   |   |      |
--------------------------------------------------------------------------------------------------------------------------------------------

859

asked Feb 14 '12 16:02

MK.

1 Answers

How about trying the following: If you had an index on col and userid it should resolve completely in the index and not need to touch the table at all.

Select count(userid) n, col
from (select col, userid from users group by col, userid)
group by col
;

149

answered Oct 19 '22 10:10

Roger Cornejo

Related questions
                            
                                SQL: Splitting a column into multiple words to search user input
                            
                                Why does a query execute so much faster when I (manually) cache the results of a table-valued function in a temporary table?
                            
                                hibernate support for deferred constraints?
                            
                                T-SQL: Paging WITH TIES
                            
                                How to track result set size using hibernate?
                            
                                Entity framework database first - Table per hierarchy (TPH) recursive relationship implementation
                            
                                T-SQL Dynamic Pivot with case-sensitive column names
                            
                                Querying a view immediately after writing to underlying tables in SQL Server 2014
                            
                                How to make SQL Memory Optimized Native Compiled Function Deterministic
                            
                                Setting CursorType with ADODB.Command.Execute
                            
                                Entity Framework Code First Database Recovery Model
                            
                                Compiled query fails - Query was compiled for a different mapping source than the one associated with the specified DataContext
                            
                                How to squash typeORM migration files into one
                            
                                Using MATCH with OR returns results that satisfy none of the conditions
                            
                                Is assembly running in SQL Server or from a Windows app
                            
                                User-defined aggregate functions with multiple input columns in PostgreSQL
                            
                                Forcing linq to perform inner joins

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With