Consider the following table: <pre class="prettyprint"><code>TAB6 A B C ---------- ---------- - 1 2 A 2 1 A 2 3 C 3 4 D </code></pre> I consider, the records {1,2, A} and {2, 1, A} as duplicate. I need to select and produce the below record set: <pre class="prettyprint"><code> A B C A B C ---------- ---------- - ---------- ---------- - 1 2 A or 2 1 A 2 3 C 2 3 C 3 4 D 3 4 D </code></pre> I tried the below queries. But to no avail. <pre class="prettyprint"><code>select t1.* from t6 t1 , t6 t2 where t1.a <> t2.b and t1.b <> t2.a and t1.rowid <> t2.rowid / A B C ---------- ---------- - 1 2 A 2 1 A 2 1 A 2 3 C 3 4 D 3 4 D 6 rows selected. </code></pre> Or even this: <pre class="prettyprint"><code> select * from t6 t1 where exists (select * from t6 t2 where t1.a <> t2.b and t1.b <> t2.a) / A B C ---------- ---------- - 1 2 A 2 1 A 2 3 C 3 4 D </code></pre> Both did not work. The database would be Oracle 10g. Looking for a pure SQL solution. Every help is appreciated.

Use GREATEST() and LEAST() functions to identify the common values across multiple columns. Then use DISTINCT to winnow out the duplicates. <pre class="prettyprint"><code>select distinct least(a, b) as a , greatest(a, b) as b , c from t6 </code></pre> This gives you the precise record set you asked for. But things will get more complicated if you need to include other columns from T6. <hr> <blockquote> "But I was wondering if this will work for VARCHAR2 fields also?" </blockquote> Yes but it will use ASCII values to determine order, which is not always what you might expect (or desire). <blockquote> "Also, my table T6 might have tens of thousand of records." </blockquote> That really isn't a lot of data in today's terms. The DISTINCT will cause a sort, which should be able to fit in memory unless <code>A</code> and <code>B</code> are really long VARCHAR2 columns - but probably even then. If this is a query you're going to want to run a lot then you can build a function-based index to satisfy it: <pre class="prettyprint"><code>create index t6_fbi on t6(least(a, b) , greatest(a, b) , c ) / </code></pre> But I would really only bother if you have a genuine performance issue with the query.

SQL: Removing Duplicate records - Albeit different kind

Tags:

sql

oracle

duplicate-removal

duplicate-data

Consider the following table:

TAB6
         A          B C
---------- ---------- -
         1          2 A
         2          1 A
         2          3 C
         3          4 D

I consider, the records {1,2, A} and {2, 1, A} as duplicate. I need to select and produce the below record set:

         A          B C                      A          B C
---------- ---------- -             ---------- ---------- -
         1          2 A         or           2          1 A
         2          3 C                      2          3 C
         3          4 D                      3          4 D

I tried the below queries. But to no avail.

select t1.*
from t6 t1
, t6 t2
where t1.a <> t2.b
and t1.b <> t2.a
and t1.rowid <> t2.rowid
/

         A          B C
---------- ---------- -
         1          2 A
         2          1 A
         2          1 A
         2          3 C
         3          4 D
         3          4 D

6 rows selected.

Or even this:

 select *
 from t6 t1
 where exists (select * from t6 t2 where t1.a <> t2.b and t1.b <> t2.a)
/
         A          B C
---------- ---------- -
         1          2 A
         2          1 A
         2          3 C
         3          4 D

Both did not work.

The database would be Oracle 10g. Looking for a pure SQL solution. Every help is appreciated.

664

asked Jan 12 '12 04:01

G P

1 Answers

Use GREATEST() and LEAST() functions to identify the common values across multiple columns. Then use DISTINCT to winnow out the duplicates.

select distinct least(a, b) as a
       , greatest(a, b) as b
       , c
from t6

This gives you the precise record set you asked for. But things will get more complicated if you need to include other columns from T6.

"But I was wondering if this will work for VARCHAR2 fields also?"

Yes but it will use ASCII values to determine order, which is not always what you might expect (or desire).

"Also, my table T6 might have tens of thousand of records."

That really isn't a lot of data in today's terms. The DISTINCT will cause a sort, which should be able to fit in memory unless A and B are really long VARCHAR2 columns - but probably even then.

If this is a query you're going to want to run a lot then you can build a function-based index to satisfy it:

create index t6_fbi on t6(least(a, b)
                           , greatest(a, b)
                           , c )
/

But I would really only bother if you have a genuine performance issue with the query.

192

answered Sep 21 '22 22:09

APC

Related questions
                            
                                B-trees, databases, sequential vs. random inserts, and speed. Random is winning
                            
                                Is there ever a case in SQL where a subquery is more efficient than a join?
                            
                                Reverse of ON DELETE CASCADE
                            
                                SQL and unique n-column combinations
                            
                                LINQ join and group
                            
                                Reducing Parse Calls in Oracle
                            
                                django aggregation to lower resolution using grouping by a date range
                            
                                SVN database versioning for multiple developers environment
                            
                                Database model to object oriented design?
                            
                                Can an SQL constraint be used to prevent a particular value being changed when a condition holds?
                            
                                Delay in using fulltext search in SQL Server
                            
                                Create ,UPDATE and DELETE call using django-tastypie
                            
                                Split a string and return greatest in mssql
                            
                                Sorting/Grouping SQL by day-of-week and hour-of-day
                            
                                Odd behaviour when doing LIKE with wildcards searching for backslash in MySQL
                            
                                PostgreSQL: order by sum of computed values
                            
                                BLOB's in SQL that stores a Video file
                            
                                What characters are allowed in Oracle bind param placeholders?
                            
                                Is there a way to use the condition of an If statement as its value?
                            
                                Database design / normalization structure needs to contain ANDs, ORs, optional elements and their relationships

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With