Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient way to implement LinkedIn like "How you are connected to" feature?

LinkedIn has this cool feature in which while visiting some user's profile, LinkedIn prompts how you are connecting to that user through the network.

Assuming that the visitor and the profile owner are two nodes of a graph where the nodes represent users and edge represents friendship, a simple solution could be a bfs starting from both the nodes up to certain level and see if there are any intersections. The intersections would be the network link-nodes.

Although this sounds neat, the problem is that in order to determine friends of each person, a separate DB query is needed. When the network goes deeper than 2 levels, it would be highly time consuming algorithm. Is there a better efficient alternative? If not, how can we add better hardware support (parallel computing, grids, distributed database etc) in order to bring down the time required for computation?

like image 822
Chirantan Avatar asked Oct 13 '09 05:10

Chirantan


2 Answers

You can see how this can be done in the article Graphs in the database: SQL meets social networks by Lorenzo Alberton. The example code is written for PostgreSQL using CTEs. However, I doubt that using a RDBMS for this will perform well. I wrote up an article on how to do the same stuff as in the mentioned article using a native graph database, in this case Neo4j: Social networks in the database: using a graph database. Other than the differences in performance, a graph database also simplifies the task by providing a graph API that makes it easy to handle traversals that would be extremely complex to write in SQL (or by using stored procedures). I wrote a bit more on graph databases in this thread and see this one too.

like image 149
nawroth Avatar answered Nov 16 '22 10:11

nawroth


Without some kind of recursive stored procedure (CTE in SQL Server 2005+), you'll need multiple round trips as the levels get deeper. However, a good cache infrastructure could really help performance as the most popular / active users' connection lists would remain cached. A read/write through cache mechanism would make things even better (cache updates cascade to db updates, cache reads cascade to db reads)

like image 1
Chris Avatar answered Nov 16 '22 11:11

Chris