Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL deduplicate from two columns

I've been struggling with this for quite some time but i just can't figure it out.

I have a table with 3 columns. 2 columns containing names and the third one containing the Damerau Levensthein distance ( http://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance ) between those names.

each column has includes every single name, that means all the names present in the autor1 column are present in the autor2 column as well. as a result i have twice the rows needed, just with the autor1 and autor2 columns swapped.

as an example, row 3 equals to row 1 just with the autor columns swapped, the same goes for 2-4. how would i formulate a query that omits those "duplicates"?
id-|------autor1----|------autor2-----| dld
1 -| Abel, Gustav -| Abel, Gustave | 1
2 -| Abel, Gustav -| Abele, Gustav | 1
3 -| Abel, Gustave| Abel, Gustav --| 1
4 -| Abele, Gustav | Abel, Gustav -| 1

to
|------autor1----|------autor2-----| dld
| Abel, Gustav -| Abel, Gustave | 1
| Abel, Gustav -| Abele, Gustav | 1

like image 989
lightxx Avatar asked May 07 '12 12:05

lightxx


1 Answers

Using NOT EXISTS works on all DBMS's I know of. The complexity of this is to not forget to include a clause on id. Without it, nothing would be returned.

SELECT *
FROM   YourTable yto
WHERE  NOT EXISTS (
         SELECT  *
         FROM    YourTable yti
         WHERE   yti.autor2 = yto.autor1
                 AND yti.id > yto.id
       )

Edit

step by step, following is a breakdown of the logic behind the statement

  1. Get the first record (ID = 1)
  2. Is there a record where ID > 1 and autor1 = autor2 (Yes, ID 3) -> Ignore
  3. Get the next record (ID = 2)
  4. Is there a record where ID > 2 and autor1 = autor2 (Yes, ID 4) -> Ignore
  5. Get the next record (ID = 3)
  6. Is there a record where ID > 3 and autor1 = autor2 (No) -> Include
  7. Get the next record (ID = 4)
  8. Is there a record where ID > 4 and autor1 = autor2 (No) -> Include
like image 151
Lieven Keersmaekers Avatar answered Oct 01 '22 10:10

Lieven Keersmaekers