Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to select distinct pairs in MySQL join (same table) with transitivity?

Tags:

sql

mysql

I'm facing a very poorly designed database with a non-normalized table X. This table X should have a N:M relationship with another table Y.

The problem is that this relationship is currently 1:N and the jerry-rigged solution until now was to duplicate the entries when there was various registries to be related.

Simplifying, I have this:

| ID | TEXT | LOCATION_ID |
| 1  | foo  |      1      |
| 2  | foo  |      2      |
| 3  | bar  |      1      |
| 4  | bar  |      4      |
| 5  | bar  |      3      |

I have to normalize this table. So, my first idea was try to obtain pairs of similar registries. Something like this:

| a.ID | b.ID | 
|   1  |  2   |
|   3  |  4   |
|   3  |  5   |

Experimenting a little bit:

SELECT a.id, b.id 
FROM mytable AS a 
INNER JOIN mytable AS b 
   ON a.text = b.text AND a.id != b.id 
GROUP BY a.id, b.id

This lead to a problem like this:

| a.ID | b.ID | 
|   1  |  2   |
|   2  |  1   |
|   3  |  4   |
|   3  |  5   |
|   4  |  3   |
|   4  |  5   |
|   5  |  3   |
|   5  |  4   |

The pairs were duplicated.

After some digging, I realized that this was more efficient:

SELECT a.id, b.id 
FROM mytable AS a 
INNER JOIN mytable AS b 
        ON a.text = b.text AND a.id < b.id 
GROUP BY a.id, b.id

So, I got this:

| a.ID | b.ID | 
|   1  |  2   |
|   3  |  4   |
|   3  |  5   |
|   4  |  5   |

But I still need to get rid of that last register.

like image 519
Henrique Barcelos Avatar asked Jul 13 '13 17:07

Henrique Barcelos


People also ask

Can I use distinct in middle of select statement?

The DISTINCT clause filters out FULL DUPLICATE ROWS. It goes right after the SELECT keyword, since it applies to the entire row, not single columns. You cannot use it in between columns.


Video Answer


1 Answers

Group on only one side and take the MIN() of the other:

SELECT   MIN(a.ID) a, b.ID b
FROM     mytable a JOIN mytable b ON b.text = a.text AND b.ID > a.ID
GROUP BY b.ID

See it on sqlfiddle.

like image 128
eggyal Avatar answered Sep 28 '22 10:09

eggyal