Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I find pairs that share the one property (column) through multiple tuples (rows)?

Tags:

sql

mysql

I'm trying to find pairs of users that enjoy the same set of TV shows in this simplified example

Let's say I have a table where each user gets an entry for each TV Show that they enjoy:

|USER | Show        |
|-----|-------------|
|001  | Lost        |
|001  | South Park  |
|002  | Lost        |
|003  | Lost        |
|003  | South Park  |
|004  | South Park  |
|005  | Lost        |
|006  | Lost        |

Then I would like a result of:

|USER1 |USER2 |
|------|------|
|001   |003   |
|003   |001   |
|002   |005   |
|002   |006   |
|005   |002   |
|005   |006   |
|006   |002   |
|006   |005   |

Or an even better version would be:

|USER1 |USER2 |
|------|------|
|001   |003   |
|002   |005   |
|002   |006   |
|005   |006   |

Which basically says: User 1 likes the same set of shows as User 3.

I've been playing around with GROUP BY and JOIN, but I still can't find the answer :(.

So far, I've found that using

SELECT s1.User as USER1, s2.User as USER2, s1.Show as Show 
FROM Shows s1 JOIN (SELECT * FROM Shows) s2 
ON s1.Shows=s2.Shows AND s1.User!=s2.User;

Which yields pairs of users and the Show they have in common. But I don't know where to go from here.

like image 386
Cesar Andreu Avatar asked Oct 07 '22 09:10

Cesar Andreu


1 Answers

If you can accept CSV instead of tabulated results, you could simply group the table twice:

SELECT GROUP_CONCAT(User) FROM (
  SELECT   User, GROUP_CONCAT(DISTINCT `Show` ORDER BY `Show` SEPARATOR 0x1e) AS s
  FROM     Shows
  GROUP BY User
) t GROUP BY s

Otherwise, you can join the above subquery to itself:

SELECT DISTINCT LEAST(t.User, u.User) AS User1,
             GREATEST(t.User, u.User) AS User2
FROM (
  SELECT   User, GROUP_CONCAT(DISTINCT `Show` ORDER BY `Show` SEPARATOR 0x1e) AS s
  FROM     Shows
  GROUP BY User
) t JOIN (
  SELECT   User, GROUP_CONCAT(DISTINCT `Show` ORDER BY `Show` SEPARATOR 0x1e) AS s
  FROM     Shows
  GROUP BY User
) u USING (s)
WHERE t.User <> u.User

See them on sqlfiddle.

Of course, if duplicate (User, Show) pairs are guaranteed not to exist in the Shows table, you could improve performance by removing the DISTINCT keyword from the GROUP_CONCAT() aggregations.

like image 98
eggyal Avatar answered Oct 10 '22 03:10

eggyal