Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL: Removing Duplicate records - Albeit different kind

Consider the following table:

TAB6
         A          B C
---------- ---------- -
         1          2 A
         2          1 A
         2          3 C
         3          4 D

I consider, the records {1,2, A} and {2, 1, A} as duplicate. I need to select and produce the below record set:

         A          B C                      A          B C
---------- ---------- -             ---------- ---------- -
         1          2 A         or           2          1 A
         2          3 C                      2          3 C
         3          4 D                      3          4 D

I tried the below queries. But to no avail.

select t1.*
from t6 t1
, t6 t2
where t1.a <> t2.b
and t1.b <> t2.a
and t1.rowid <> t2.rowid
/

         A          B C
---------- ---------- -
         1          2 A
         2          1 A
         2          1 A
         2          3 C
         3          4 D
         3          4 D

6 rows selected.

Or even this:

 select *
 from t6 t1
 where exists (select * from t6 t2 where t1.a <> t2.b and t1.b <> t2.a)
/
         A          B C
---------- ---------- -
         1          2 A
         2          1 A
         2          3 C
         3          4 D

Both did not work.

The database would be Oracle 10g. Looking for a pure SQL solution. Every help is appreciated.

like image 664
G P Avatar asked Jan 12 '12 04:01

G P


People also ask

How to remove duplicate records from a table in SQL Server?

In the table, we have a few duplicate records, and we need to remove them. In this method, we use the SQL GROUP BY clause to identify the duplicate rows. The Group By clause groups data as per the defined columns and we can use the COUNT function to check the occurrence of a row.

How many records are duplicates in a single table?

The sample of data has 1,220 records in a single table, which looks like this: Let’s say that a record is a duplicate if it contains the same first_name and last_name values. Let’s take a look at the different ways to remove duplicates in SQL. Here’s a summary of the different methods and which databases they work on.

How to delete duplicate data from a table in SSIs?

Click on Preview data and you can see we still have duplicate data in the source table Add a Sort operator from the SSIS toolbox for SQL delete operation and join it with the source data For the configuration of the Sort operator, double click on it and select the columns that contain duplicate values.

How to remove [duplicatecount] greater than 1 in SQL?

It removes the rows having the value of [DuplicateCount] greater than 1 We can use the SQL RANK function to remove the duplicate rows as well. SQL RANK function gives unique row ID for each row irrespective of the duplicate row. In the following query, we use a RANK function with the PARTITION BY clause.


1 Answers

Use GREATEST() and LEAST() functions to identify the common values across multiple columns. Then use DISTINCT to winnow out the duplicates.

select distinct least(a, b) as a
       , greatest(a, b) as b
       , c
from t6 

This gives you the precise record set you asked for. But things will get more complicated if you need to include other columns from T6.


"But I was wondering if this will work for VARCHAR2 fields also?"

Yes but it will use ASCII values to determine order, which is not always what you might expect (or desire).

"Also, my table T6 might have tens of thousand of records."

That really isn't a lot of data in today's terms. The DISTINCT will cause a sort, which should be able to fit in memory unless A and B are really long VARCHAR2 columns - but probably even then.

If this is a query you're going to want to run a lot then you can build a function-based index to satisfy it:

create index t6_fbi on t6(least(a, b)
                           , greatest(a, b)
                           , c )
/

But I would really only bother if you have a genuine performance issue with the query.

like image 192
APC Avatar answered Sep 21 '22 22:09

APC