Hi all,
First of all, I'm not trying to create a social network, facebook is big enough! (comic) I've chosen this question as example because it fits exactly on what I'm trying to do.
Imagine that I have in MySQL a users
table and a user_connections
table with 'friend requests'. If so, it would be something like this:
Users Table:
userid username
1 John
2 Amalia
3 Stewie
4 Stuart
5 Ron
6 Harry
7 Joseph
8 Tiago
9 Anselmo
10 Maria
User Connections Table:
userid_request userid_accepted
2 3
7 2
3 4
7 8
5 6
4 5
8 9
4 7
9 10
6 1
10 7
1 2
Now I want to find circles between friends and create a structure array and put that circle on the database (none of the arrays can include the same friends that another has already).
Return Example:
// First Circle of Friends
Circleid => 1
CircleStructure => Array(
1 => 2,
2 => 3,
3 => 4,
4 => 5,
5 => 6,
6 => 1,
)
// Second Circle of Friends
Circleid => 2
CircleStructure => Array(
7 => 8,
8 => 9,
9 => 10,
10 => 7,
)
I'm trying to think of an algorithm to do that, but I think it will take a lot of processing time because it would randomly search the database until it 'closes' a circle.
PS: The minimum structure length of a circle is 3 connections and the limit is 100 (so the daemon doesn't search the entire database)
EDIT:
I've think on something like this:
function browse_user($userget='random',$users_history=array()){
$user = user::get($userget);
$users_history[] = $user['userid'];
$connections = user::connection::getByUser($user['userid']);
foreach($connections as $connection){
$userid = ($connection['userid_request']!=$user['userid']) ? $connection['userid_request'] : $connection['userid_accepted'];
// Start the circle array
if(in_array($userid,$users_history)) return array($user['userid'] => $userid);
$res = browse_user($userid, $users_history);
if($res!==false){
// Continue the circle array
return $res + array($user['userid'] => $userid);
}
}
return false;
}
while(true){
$res = browse_user();
// Yuppy, friend circle found!
if($res!==false){
user::circle::create($res);
}
// Start from scratch again!
}
The problem with this function is that it could search the entire database without finding the biggest circle, or the best match.
Doing this kind of operations on large datasets is almost always a (too) big job for a single batch. When working with large amounts of data, you should build indexes continuously. Make a circle test at every time an "user" is added or become "friend" with another "user", and store the resulting circles in an index table. Then when a new "user" signs up or becomes "friend" with another "user", you should use your index database to find new circles based on the old ones.
Edit:
I got a quite enthusiastic on this problem, so I made a proof-of-concept class of boisvert's ideas on how to solve this. I put the code on GitHub at: https://github.com/alfreddatakillen/Six-Degrees-of-Separation (it would be a bit messy to post it here).
It seems to work pretty well. However, I have not tested it with huge amounts of data. I'm pretty sure that you will have to optimize this, since it basicly is a brute-force attack storing every step on the way... (If you do optimize or use the code, I'd be happy if you push changes to GitHub - I might want to use it in a future project myself.)
:-)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With