Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Oracle : Identifying duplicates in a table without index

When I try to create a unique index on a large table, I get a unique contraint error. The unique index in this case is a composite key of 4 columns.

Is there an efficient way to identify the duplicates other than :

select col1, col2, col3, col4, count(*)
from Table1
group by col1, col2, col3, col4
having count(*) > 1

The explain plan above shows full table scan with extremely high cost, and just want to find if there is another way.

Thanks !

like image 548
Vj_plugs_in Avatar asked Mar 22 '10 20:03

Vj_plugs_in


2 Answers

Try creating a non-unique index on these four columns first. That will take O(n log n) time, but will also reduce the time needed to perform the select to O(n log n).

You're in a bit of a bind here -- any way you slice it, the entire table has to be read in at least once. The naïve algorithm runs in O(n2) time, unless the query optimizer is clever enough to build a temporary index/table.

like image 130
Jeffrey Hantin Avatar answered Sep 19 '22 13:09

Jeffrey Hantin


You can use the EXCEPTIONS INTO clause to trap the duplicated rows.

If you don't already have an EXCEPTIONS table create one using the provided script:

SQL>  @$ORACLE_HOME/rdbms/admin/ultexcpt.sql

Now you can attempt to create a unique constraint like this

alter table Table1
add  constraint tab1_uq UNIQUE (col1, col2, col3, col4)
exceptions into exceptions
/

This will fail but now your EXCEPTIONS table contains a list of all the rows whose keys contain duplicates, identified by ROWID. That gives you a basis for deciding what to do with the duplicates (delete, renumber, whatever).

edit

As others have noted you have to pay the cost of scanning the table once. This approach gives you a permanent set of the duplicated rows, and ROWID is the fastest way of accessing any given row.

like image 34
APC Avatar answered Sep 20 '22 13:09

APC