Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Degrees of Separation Query

Tags:

sql

mysql

I have a table of member-to-member connections. The schema is member_id, friend_id, is_active. I want to build a list of member connections of people who are friends of friends. I'm not really sure how to tackle the query, let alone in a semi-optimized way.

The table above works in a manner where the member_id and friend_id are essentially the same thing on another table. In my system, these id's are generally referred to as member_id except on this one table. For example, let's say my member_id is 21. My number can be on an infinite amount of other rows as either the member_id or the friend_id it's either based on who initiated the actual friendship request originally, that and I didn't want redundant data where I'd have dupe rows to basically do the same thing.

I'd like to have a query where I can not only establish a level of degree (think LinkedIn) but I can also establish how many mutual friends one person may have that's being displayed (think Facebook). The x factor here is the is_active column I mentioned earlier. This column could be 0 or 1. It's a simple tinyint column that acts as a on/off switch. Any friend connections with a 1 would be an active friendship whereas 0 is pending. I need to base this query off my active friends and their active friends and so on. Where none of the active friends my friends have are active friends of mine.

How can I construct a query like this (even if I can't show the level of separation and only get a mutual count)? Right now, I can sort of think of something but it involves query after query some nested in loops, and yea, I just can't picture that being anything good for my servers' overall performance or health over time.

like image 658
chris Avatar asked Feb 27 '12 16:02

chris


People also ask

How do you calculate degrees of separation?

In order to calculate the average degree of separation between the nodes in a graph, we consider one node in each of the steps, calculate the average and multiply it with the total number of nodes in that step.

Is it 6 or 7 degrees of separation?

Six degrees of separation is the theory that any person on the planet can be connected to any other person on the planet through a chain of acquaintances that has no more than five intermediaries.

Is 6 Degrees of Separation true?

They found that the average length was 6.6 hops, and that 78 per cent of the pairs could be connected in seven steps or fewer. But some were separated by as many as 29 steps. The researchers wrote: 'Via the lens provided on the world by Messenger, we find that there are about "seven degrees of separation" among people.

How does 6 degrees of separation work?

Six degrees of separation is the idea that all people are six or fewer social connections away from each other. As a result, a chain of "friend of a friend" statements can be made to connect any two people in a maximum of six steps. It is also known as the six handshakes rule.


1 Answers

Here's how to perform the search using a breadth-first, shortest path search, using JOIN. There is no magic in this algorithm, as we're using MySQL to find our answer, and we're not incorporating any fancy search algorithm that uses any kind of heuristics or optimization.

My 'friends' table has unidirectional relationships, so we do have duplicates in the sense that both '1 to 2' and '2 to 1' are stored. I'm also excluding is_active since the implementation will be obvious:

Here's the data:

member_id   friend_id
1           2
1           3
1           4
2           1
2           3
2           5
2           6
3           2
3           1
4           1
5           2
6           2
6           7
7           6
7           8
8           7

We have member 1 selected, and we're asking is 1 friends with 7, a friend of a friend, etc? A count of 0 means no, and a count of 1 means yes.

SELECT COUNT(*)
FROM friends f1
WHERE f1.member_id = 1
  AND f1.friend_id = 7

If no, then are they friend of a friend?

SELECT COUNT(*)
FROM friends f1
JOIN friends f2
  ON f2.member_id = f1.friend_id
WHERE f1.member_id = 1
  AND f2.friend_id = 7

If no, then friend of a friend of a friend?

SELECT COUNT(*)
FROM friends f1
JOIN friends f2
  ON f2.member_id = f1.friend_id
JOIN friends f3
  ON f3.member_id = f2.friend_id
WHERE f1.member_id = 1
  AND f3.friend_id = 7

And so on...

The third query would find the path '1 to 2', '2 to 6', and '6 to 7', returning the count of 1.

Each query becomes more expensive (due to the larger number of joins), so you may want to limit the search at some point. One cool thing is that this search works from both ends toward the middle, which is one simple optimization suggested for shortest path searches.

Here's how to find those mutual friend recommendations for member 1:

SELECT f2.friend_id
FROM friends f1
JOIN friends f2
  ON f2.member_id = f1.friend_id
LEFT JOIN friends f3
  ON f3.member_id = f1.member_id
  AND f3.friend_id = f2.friend_id
WHERE f1.member_id = 1
  AND f2.friend_id <> f1.member_id // Not ourself
  AND f3.friend_id IS NULL // Not already a friend
like image 120
Marcus Adams Avatar answered Sep 29 '22 06:09

Marcus Adams