Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL selecting people you may know

Tags:

sql

mysql

The question you're asking appears subjective and is likely to be closed.

I wasn't surprised when I saw above horrible warning while I was filling in title field.

I read almost every thread talking about friends of friends or mutual friends but I'm not sure I found the right solution that I want to do.

I'm sorry I'm not good at English nor SQL.

How can I find the right answer while being not good at both of languages?

I decided I have to ask. I won't let myself down for down-votes or any duplication warnings.

As I want the answer, I'll write down as sincerely as possible for any further similar problems can be helped.

I have a table for friend relations.

FRIEND (TABLE)
-----------------------------------
PLAYER_ID(PK,FK)   FRIEND_ID(PK,FK)
-----------------------------------
1                  2                 // 1 knows 2
2                  1                 // 2 knows 1
1                  3                 // 1 knows 3
2                  3                 // 2 knows 3
2                  4                 // 2 knows 4
2                  5                 // 2 knows 5 // updated
3                  5                 // 3 knows 5 // updated
1                  100
1                  200
1                  300
100                400
200                400
300                400

Both composite primary keys are also foreign keys from PLAYER table.

I asked and got answered from such nice people for "people know each other".

SQL view for acquaintance from table.

And I have a view like this.

ACQUAINTANCE (VIEW)
-----------------------------------
PLAYER_ID(PK,FK)   FRIEND_ID(PK,FK)
-----------------------------------
1                  2                 // 1 knows 2
2                  1                 // 2 knows 1

As you might be noticed, this relationships' business logic has following two purposes.

  1. One player can say he or she knows someone else.
  2. When both people say they know each other, they can be said as acquaintance.

And, now, I want to know is there any good way for

  1. Selecting other PLAYER_IDs
  2. With given PLAYER(PLAYER_ID) (say 1)
  3. Which each is one of `friends of given PLAYER's direct friends'
  4. Which each is not the PLAYER himself (excluding 1 -> 2 -> 1)
  5. Which each is not the PLAYER's direct friends (excluding 3 from 1 -> 2 -> 3 by 1 -> 3)
  6. Order by number of mutual friends if possible.

I think the Justin Niessner's answer in "people you may know" sql query is the closest path that I must follow.

Thanks in advance.

I'll close the thread if this subject is really duplicated and not necessary.

UPDATE --------------------------------------------------------------

for Raphaël Althaus's comment whose name is same with my future daughter (is it boy's name?),

3 is a candidate for friends of friends of 1 because

1 knows 2
2 knows 3

but excluded because

1 already knows 3

Basically I want to serve for the given player the

people he or she may know
which is not himself or herself // this is nothing but obvious
which each is not already known to himself

With above table

by 1 -> 2 -> 4 and 1 -> 3 -> 5

4 and 5 can be suggested for 1 as 'people you may know'

order by number of mutual friends will be perfect
but I don't think I can understand even if someone show me how. sorry.

Thank you.

UPDATE ---------------------------------------------------------------------

I think I must try step by step by myself from what I've learned FROM HERE WITH VARIOUS PEOPLE even if it's not the right answer. Please let me know if I'm doing anything wrong.

First of all, let me self join the FRIEND table itself.

SELECT *
FROM FRIEND F1 INNER JOIN FRIEND F2 ON F1.FRIEND_ID = F2.PLAYER_ID

prints

+-----------+-----------+-----------+-----------+
| PLAYER_ID | FRIEND_ID | PLAYER_ID | FRIEND_ID |
+-----------+-----------+-----------+-----------+
|         1 |         2 |         2 |         1 |
|         1 |         2 |         2 |         3 |
|         1 |         2 |         2 |         4 |
|         1 |         2 |         2 |         5 |
|         1 |         3 |         3 |         5 |
|         2 |         1 |         1 |         2 |
|         2 |         1 |         1 |         3 |
|         2 |         3 |         3 |         5 |
+-----------+-----------+-----------+-----------+

F2.FRIEND_ID only

SELECT F2.FRIEND_ID
FROM FRIEND F1 INNER JOIN FRIEND F2 ON F1.FRIEND_ID = F2.PLAYER_ID

prints

+-----------+
| FRIEND_ID |
+-----------+
|         1 |
|         3 |
|         4 |
|         5 |
|         5 |
|         2 |
|         3 |
|         5 |
+-----------+

for 1 only

SELECT F2.FRIEND_ID
FROM FRIEND F1 INNER JOIN FRIEND F2 ON F1.FRIEND_ID = F2.PLAYER_ID
WHERE F1.PLAYER_ID = 1;

prints

+-----------+
| FRIEND_ID |
+-----------+
|         1 |
|         3 |
|         4 |
|         5 |
|         5 |
+-----------+

not 1

SELECT F2.FRIEND_ID
FROM FRIEND F1 INNER JOIN FRIEND F2 ON F1.FRIEND_ID = F2.PLAYER_ID
WHERE F1.PLAYER_ID = 1 
AND F2.FRIEND_ID != 1;

prints

+-----------+
| FRIEND_ID |
+-----------+
|         3 |
|         4 |
|         5 |
|         5 |
+-----------+

not 1's direct knowns

SELECT F2.FRIEND_ID
FROM FRIEND F1 INNER JOIN FRIEND F2 ON F1.FRIEND_ID = F2.PLAYER_ID
WHERE F1.PLAYER_ID = 1
AND F2.FRIEND_ID != 1
AND F2.FRIEND_ID NOT IN (SELECT FRIEND_ID FROM FRIEND WHERE PLAYER_ID = 1);

prints

+-----------+
| FRIEND_ID |
+-----------+
|         4 |
|         5 |
|         5 |
+-----------+

I think I'm getting there.

UPDATE -----------------------------------------------------------------

Following paths added

1 -> 100 -> 400
1 -> 200 -> 400
1 -> 300 -> 400

And the last query prints (again)

+-----------+
| FRIEND_ID |
+-----------+
|         4 |
|         5 |
|         5 |
|       400 |
|       400 |
|       400 |
+-----------+

at last, I got the candidates: 4, 5, 400

Putting distinct surely work for the primary goal

SELECT DISTINCT F2.FRIEND_ID
FROM FRIEND F1 INNER JOIN FRIEND F2 ON F1.FRIEND_ID = F2.PLAYER_ID
WHERE F1.PLAYER_ID = 1
AND F2.FRIEND_ID != 1
AND F2.FRIEND_ID NOT IN (SELECT FRIEND_ID FROM FRIEND WHERE PLAYER_ID = 1);

prints

+-----------+
| FRIEND_ID |
+-----------+
|         4 |
|         5 |
|       400 |
+-----------+

And, now, ordering by mutual counts needed.

Here comes the number of mutual friends for each candidates.

+-----------+
| FRIEND_ID |
+-----------+
|         4 | 1 (1 -> 2 -> 4)
|         5 | 2 (1 -> 2 -> 5, 1 -> 3 -> 5)
|       400 | 3 (1 -> 100 -> 400, 1 -> 200 -> 400, 1 -> 300 -> 400)
+-----------+

How can I calculate and order by those number of mutual friends?

SELECT F2.FRIEND_ID, COUNT(*)
FROM FRIEND F1 INNER JOIN FRIEND F2 ON F1.FRIEND_ID = F2.PLAYER_ID
WHERE F1.PLAYER_ID = 1
AND F2.FRIEND_ID != 1
AND F2.FRIEND_ID NOT IN (SELECT FRIEND_ID FROM FRIEND WHERE PLAYER_ID = 1)
GROUP BY F2.FRIEND_ID;

prints

+-----------+----------+
| FRIEND_ID | COUNT(*) |
+-----------+----------+
|         4 |        1 |
|         5 |        2 |
|       400 |        3 |
+-----------+----------+

I got it!

SELECT F2.FRIEND_ID, COUNT(*) AS MFC
FROM FRIEND F1 INNER JOIN FRIEND F2 ON F1.FRIEND_ID = F2.PLAYER_ID
WHERE F1.PLAYER_ID = 1
AND F2.FRIEND_ID != 1
AND F2.FRIEND_ID NOT IN (SELECT FRIEND_ID FROM FRIEND WHERE PLAYER_ID = 1)
GROUP BY F2.FRIEND_ID
ORDER BY MFC DESC;

prints

+-----------+-----+
| FRIEND_ID | MFC |
+-----------+-----+
|       400 |   3 |
|         5 |   2 |
|         4 |   1 |
+-----------+-----+

Can anybody please confirm this? Is that query optimal? Any possible performance problem when make it as a view?

Thank you.

UPDATE --------------------------------------------------------------------------------------------

I created a view as

CREATE VIEW FOLLOWABLE AS
    SELECT F1.PlAYER_ID, F2.FRIEND_ID AS FOLLOWABLE_ID, COUNT(*) AS MFC
    FROM FRIEND F1 INNER JOIN FRIEND F2 ON F1.FRIEND_ID = F2.PLAYER_ID
    WHERE F2.FRIEND_ID != F1.PLAYER_ID
    AND F2.FRIEND_ID NOT IN (SELECT FRIEND_ID FROM FRIEND WHERE PLAYER_ID = F1.PLAYER_ID)
    GROUP BY F2.FRIEND_ID
    ORDER BY MFC DESC;

and tested.

mysql> select * from FOLLOWABLE;
+-----------+---------------+-----+
| PlAYER_ID | FOLLOWABLE_ID | MFC |
+-----------+---------------+-----+
|         1 |           400 |   3 |
|         1 |             5 |   2 |
|         2 |           100 |   1 |
|         2 |           200 |   1 |
|         2 |           300 |   1 |
|         1 |             4 |   1 |
+-----------+---------------+-----+
6 rows in set (0.01 sec)

mysql> select * from FOLLOWABLE WHERE PLAYER_ID = 1;
+-----------+---------------+-----+
| PlAYER_ID | FOLLOWABLE_ID | MFC |
+-----------+---------------+-----+
|         1 |           400 |   3 |
|         1 |             5 |   2 |
|         1 |             4 |   1 |
+-----------+---------------+-----+
3 rows in set (0.00 sec)
like image 606
Jin Kwon Avatar asked Jun 11 '12 12:06

Jin Kwon


People also ask

How do I SELECT a person in SQL?

SQL SELECT Statement Examples For example: SELECT * FROM [Person]. [Person]; This query selects all data from the Person table in the Person schema.

How do I SELECT multiple users in SQL?

It is called relation division, here is one way to do so: SELECT `user` FROM tablename WHERE `option` in ('A', 'D') GROUP BY `user` HAVING COUNT(DISTINCT `option`) = 2; See it in action here: SQL Fiddle Demo.

Can we use SELECT * with group by in SQL?

You can use a SELECT command with a GROUP BY clause to group all rows that have identical values in a specified column or combination of columns, into a single row.

Why do we use SELECT * in SQL?

An asterisk (" * ") can be used to specify that the query should return all columns of the queried tables. SELECT is the most complex statement in SQL, with optional keywords and clauses that include: The FROM clause, which indicates the table(s) to retrieve data from.


1 Answers

use this EDIT

SELECT `friend_id` AS `possible_friend_id`
FROM `friends`
WHERE `player_id` IN (        --selecting those who are known
    SELECT `friend_id`        --by freinds of #1
    FROM `friends`
    WHERE `player_id` = 1) 
AND `friend_id` NOT IN (      --but not those who are known by #1
    SELECT `friend_id`
    FROM `friends`
    WHERE `player_id` = 1)
AND NOT `friend_id` = 1       --and are not #1 himself
                              --if one is known by multiple people
                              --he'll be multiple time in the list
GROUP BY `possible_friend_id` --so we group
ORDER BY COUNT(*) DESC        --and order by amount of repeatings
like image 136
Valerij Avatar answered Oct 13 '22 08:10

Valerij