I've been struggling with this for quite some time but i just can't figure it out.
I have a table with 3 columns. 2 columns containing names and the third one containing the Damerau Levensthein distance ( http://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance ) between those names.
each column has includes every single name, that means all the names present in the autor1 column are present in the autor2 column as well. as a result i have twice the rows needed, just with the autor1 and autor2 columns swapped.
as an example, row 3 equals to row 1 just with the autor columns swapped, the same goes for 2-4. how would i formulate a query that omits those "duplicates"?
id-|------autor1----|------autor2-----| dld
1 -| Abel, Gustav -| Abel, Gustave | 1
2 -| Abel, Gustav -| Abele, Gustav | 1
3 -| Abel, Gustave| Abel, Gustav --| 1
4 -| Abele, Gustav | Abel, Gustav -| 1
to
|------autor1----|------autor2-----| dld
| Abel, Gustav -| Abel, Gustave | 1
| Abel, Gustav -| Abele, Gustav | 1
Using NOT EXISTS
works on all DBMS's I know of. The complexity of this is to not forget to include a clause on id
. Without it, nothing would be returned.
SELECT *
FROM YourTable yto
WHERE NOT EXISTS (
SELECT *
FROM YourTable yti
WHERE yti.autor2 = yto.autor1
AND yti.id > yto.id
)
Edit
step by step, following is a breakdown of the logic behind the statement
ID = 1
)ID > 1
and autor1 = autor2
(Yes, ID 3) -> Ignore
ID = 2
)ID > 2
and autor1 = autor2
(Yes, ID 4) -> Ignore
ID = 3
)ID > 3
and autor1 = autor2
(No) -> Include
ID = 4
)ID > 4
and autor1 = autor2
(No) -> Include
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With