SQL deduplicate from two columns

Question

I've been struggling with this for quite some time but i just can't figure it out.

I have a table with 3 columns. 2 columns containing names and the third one containing the Damerau Levensthein distance ( http://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance ) between those names.

each column has includes every single name, that means all the names present in the autor1 column are present in the autor2 column as well. as a result i have twice the rows needed, just with the autor1 and autor2 columns swapped.

as an example, row 3 equals to row 1 just with the autor columns swapped, the same goes for 2-4. how would i formulate a query that omits those "duplicates"?
id-|------autor1----|------autor2-----| dld
1 -| Abel, Gustav -| Abel, Gustave | 1
2 -| Abel, Gustav -| Abele, Gustav | 1
3 -| Abel, Gustave| Abel, Gustav --| 1
4 -| Abele, Gustav | Abel, Gustav -| 1

to
|------autor1----|------autor2-----| dld
| Abel, Gustav -| Abel, Gustave | 1
| Abel, Gustav -| Abele, Gustav | 1

Lieven Keersmaekers · Accepted Answer

Using NOT EXISTS works on all DBMS's I know of. The complexity of this is to not forget to include a clause on id. Without it, nothing would be returned.

SELECT *
FROM   YourTable yto
WHERE  NOT EXISTS (
         SELECT  *
         FROM    YourTable yti
         WHERE   yti.autor2 = yto.autor1
                 AND yti.id > yto.id
       )

Edit

step by step, following is a breakdown of the logic behind the statement

Get the first record (ID = 1)
Is there a record where ID > 1 and autor1 = autor2 (Yes, ID 3) -> Ignore
Get the next record (ID = 2)
Is there a record where ID > 2 and autor1 = autor2 (Yes, ID 4) -> Ignore
Get the next record (ID = 3)
Is there a record where ID > 3 and autor1 = autor2 (No) -> Include
Get the next record (ID = 4)
Is there a record where ID > 4 and autor1 = autor2 (No) -> Include

SQL deduplicate from two columns

Tags:

sql

duplicates

ignore-duplicates

lightxx

1 Answers

Lieven Keersmaekers

Recent Activity

Donate For Us

SQL deduplicate from two columns

Tags:

sql

duplicates

ignore-duplicates

lightxx

1 Answers

Lieven Keersmaekers

Related questions

Recent Activity

Donate For Us