How you would code an efficient algorithm which can return a social 'distance' between two users.
For example, when you visit a profile on LinkedIn you can see what is the distance between you and the user.
-> user A is friend with user B - and B is friend with C. when A will visit C (the distance will be 1)
The graph is huge and so I'm wondering how it can be perform so fast.
I know that this question is likely to be closed but I really think it is a programming/algorithm question - I would not specify any languages because I am interested about the concept.
assuming you don't have any heuristic function about the distance to the target, the best solution that is valid is bi-directional BFS:
Algorithm idea: do a BFS search simultaneously from the source and the target: [BFS until depth 1 in both, until depth 2 in both, ....].
The algorithm will end when you find a vertex v, which is in both BFS's front.
Algorithm behavior: The vertex v that terminates the algorithm's run will be exactly in the middle between the source and the target.
This algorithm will yield much better result in most cases then BFS from the source [explanation why it is better then BFS follows], and will surely provide an answer, if one exist.
why is it better then BFS from the source?
assume the distance between source to target is k
, and the branch factor is B
[every vertex has B edges].
BFS will open: 1 + B + B^2 + ... + B^k
vertices.
bi-directional BFS will open: 2 + 2B + 2B^2 + 2B^3 + .. + 2B^(k/2)
vertices.
for large B and k, the second is obviously much better than the first.
EDIT:
NOTE, that this solution does NOT require storing the whole graph in memory, it only requires implementing a function: successor(v)
which returns all the successors of a vertex [all vertices you can get to, within 1 step from v]. With this, only the nodes you open [2 + 2B + ... + 2B^(k/2)
as explained above] should be stored. To further save memory, you can use Iterative Deepening DFS from one direction, instead of BFS, but it will consume more time.
I would have assumed that this would be done by applying a shortest path algorithm, such as breadth first search to a graph database. But they appear to store their entire graph in memory, at least according to this.
I'm sure that the algorithm ultimately boils down to some form of shortest path one over a graph structure (nodes and edges).
Edit: Changed the algorithm as per comments.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With