I'm facing a very poorly designed database with a non-normalized table X. This table X should have a N:M relationship with another table Y.
The problem is that this relationship is currently 1:N and the jerry-rigged solution until now was to duplicate the entries when there was various registries to be related.
Simplifying, I have this:
| ID | TEXT | LOCATION_ID |
| 1 | foo | 1 |
| 2 | foo | 2 |
| 3 | bar | 1 |
| 4 | bar | 4 |
| 5 | bar | 3 |
I have to normalize this table. So, my first idea was try to obtain pairs of similar registries. Something like this:
| a.ID | b.ID |
| 1 | 2 |
| 3 | 4 |
| 3 | 5 |
Experimenting a little bit:
SELECT a.id, b.id
FROM mytable AS a
INNER JOIN mytable AS b
ON a.text = b.text AND a.id != b.id
GROUP BY a.id, b.id
This lead to a problem like this:
| a.ID | b.ID |
| 1 | 2 |
| 2 | 1 |
| 3 | 4 |
| 3 | 5 |
| 4 | 3 |
| 4 | 5 |
| 5 | 3 |
| 5 | 4 |
The pairs were duplicated.
After some digging, I realized that this was more efficient:
SELECT a.id, b.id
FROM mytable AS a
INNER JOIN mytable AS b
ON a.text = b.text AND a.id < b.id
GROUP BY a.id, b.id
So, I got this:
| a.ID | b.ID |
| 1 | 2 |
| 3 | 4 |
| 3 | 5 |
| 4 | 5 |
But I still need to get rid of that last register.
The DISTINCT clause filters out FULL DUPLICATE ROWS. It goes right after the SELECT keyword, since it applies to the entire row, not single columns. You cannot use it in between columns.
Group on only one side and take the MIN()
of the other:
SELECT MIN(a.ID) a, b.ID b
FROM mytable a JOIN mytable b ON b.text = a.text AND b.ID > a.ID
GROUP BY b.ID
See it on sqlfiddle.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With