In Oracle 12, we finally have a limit feature, so we can simply
select distinct columnname from mytable fetch first n rows only;
However, I am presently stuck with a previous version and would like to know how to achieve this result.
Ideally the query should return rows immediately, even for a huge table. It should return results as soon as the N distinct have been found, instead of processing all rows.
I think using
.. where rownum < XXX
should work
Perhaps
select * from (select distinct columnname from mytable) where rownum < XXX
No version of Oracle has a native syntax that returns a distinct Top N in an optimal way. This feature must be manually created using a PL/SQL pipelined function.
This script creates a table with one column, about 100 million rows, that uses roughly 1GB of space.
--drop table mytable purge;
create table mytable(columnname number not null) nologging;
insert /*+ append */ into mytable
select level from dual connect by level <= 100000;
commit;
begin
for i in 1 .. 10 loop
insert /*+ append */ into mytable select * from mytable;
commit;
end loop;
end;
/
begin
dbms_stats.gather_table_stats(user, 'MYTABLE');
end;
/
--1.25GB.
select bytes/1024/1024/1024 gb from dba_segments where segment_name = 'MYTABLE';
The new 12c syntax consistently runs in about 20 seconds to return a small amount of rows:
select distinct columnname from mytable fetch first 10 rows only;
The statement reads the entire table, hashes the entire table, and then grabs the top N rows:
explain plan for
select distinct columnname from mytable fetch first 10 rows only;
select * from table(dbms_xplan.display(format => 'basic'));
Plan hash value: 239985407
------------------------------------------
| Id | Operation | Name |
------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | VIEW | |
| 2 | WINDOW NOSORT STOPKEY| |
| 3 | VIEW | |
| 4 | HASH UNIQUE | |
| 5 | TABLE ACCESS FULL | MYTABLE |
------------------------------------------
The Oracle 11g version created by Ed Heal below surprisingly works better! It runs in about 12 seconds.
select * from (select distinct columnname from mytable) where rownum < 10;
Even though it's much faster, 12 seconds is still horrible. Regardless of my CPU or I/O performance, the algorithms must be wrong if it's taking seconds instead of milliseconds.
Indeed, this plan looks a little better. It has a SORT GROUP BY STOPKEY
low in the plan. This stops the query before it processes everything. But it's still stopping much too late. (Maybe Oracle is still reading the entire table but only sorting part of it?)
explain plan for
select * from (select distinct columnname from mytable) where rownum < 10;
select * from table(dbms_xplan.display(format => 'basic'));
Plan hash value: 3842480186
-------------------------------------------
| Id | Operation | Name |
-------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | COUNT STOPKEY | |
| 2 | VIEW | |
| 3 | SORT GROUP BY STOPKEY| |
| 4 | TABLE ACCESS FULL | MYTABLE |
-------------------------------------------
This is an ugly solution for several reasons. It requires new code and objects for different result sets. It may not scale well - the function has a collection to store previous results, what happens if that collection gets huge?
Every different result type requires new objects:
--Create an object to hold a record with the result columns.
--(Not necessary for this simple example since there's only one column, but will
-- be necessary if there are multiple columns.)
create or replace type columnname_rec is object
(
columnname number
);
--Create an object to hold a table of the records.
create or replace type columnname_tab is table of columnname_rec;
And a different function that returns a different type:
--Function that returns the distinct Top N as soon as they are found.
create or replace function fast_distinct_top_n(p_n number, p_cursor in sys_refcursor) return columnname_tab pipelined is
v_columnname number;
v_distinct_count number := 0;
type previous_values_type is table of varchar2(4000) index by varchar2(4000);
v_previous_values previous_values_type;
begin
loop
--Get new value.
fetch p_cursor into v_columnname;
--If the new value does not exist...
if not v_previous_values.exists(v_columnname) then
--Save the new value.
v_previous_values(v_columnname) := v_columnname;
--Increment the counter.
v_distinct_count := v_distinct_count + 1;
--Return the value
pipe row(columnname_rec(v_columnname));
--Exit if the counter is more than the top N.
exit when v_distinct_count >= p_n;
end if;
end loop;
end;
/
But finally we have created a query that returns distinct Top N results in milliseconds.
select * from table(fast_distinct_top_n(10, cursor(select * from mytable)));
If you can't stand creating so many objects there's probably a way to make it generic using Method4. But that solution would still be complicated.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With