Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using python's networkX to compute personalized page rank

I am trying to build a directed graph and compute personalized page rank over this graph. So suppose I have a graph with vertices {1,2,3,4} and edges going from 2, 3, and 4 to vertex 1, I would like to:

(1) compute the personalized page rank of every vertex with respect to 1

(2) compute the personalized page rank of every vertex with respect to 2.

The question is how I should pass this option in the personalized page rank function. The following code does not seem to do what I want:

import networkx as nx

G = nx.DiGraph()

[G.add_node(k) for k in [1,2,3,4]]
G.add_edge(2,1)
G.add_edge(3,1)
G.add_edge(4,1)


ppr1 = nx.pagerank(G,personalization={1:1, 2:1, 3:1, 4:1})
ppr2 = nx.pagerank(G,personalization={1:2, 2:2, 3:2, 4:2})

Right now ppr1 == ppr2, even though it should not be the case.

================================================================== UPDATE.

In response to comment below, my understanding of personalized page rank comes from the following:

An equivalent definition is in terms of the terminal node of a random walk starting from s. Let (X0, X1, . . . , XL) be a random walk starting from X0 = s of length L ∼ Geometric(α). Here by L ∼ Geometric(α) we mean Pr[L = ] = (1−α) α. This walk starts at s and does the following at each step: with probability α, terminate; and with the remaining probability 1 − α, continue to a random out-neighbor of the current node. Here if the current node is u, the random neighbor v ∈ N out(u) is chosen with probability wu,v if the graph is weighted or with uniform probability 1/dout(u) if the graph is unweighted. Then the PPR of any node t is the probability that this walk stops at t:

Found on page 6 of this thesis: https://cs.stanford.edu/people/plofgren/bidirectional_ppr_thesis.pdf

So I suppose what I am looking for when computing "the personalized page rank of t with respect to s" is if we start a random walk from s according to the process described above, what is the probability that this walk terminates at t.

like image 621
xiaolingxiao Avatar asked Apr 04 '17 01:04

xiaolingxiao


1 Answers

In the conceptualization of PageRank, a random surfer is moving around following links. At each step there is a nonzero probability the surfer goes to a random page (as opposed to following a link). If the choice of that random page is weighted, then it is referred to as personalized PageRank.

In your case you want that jump to be to a single specific page. So you need to tell it that all the other pages have zero probability of being selected in those steps when the surfer jumps rather than following an edge.

ppr1 = nx.pagerank(G,personalization={1:1, 2:0, 3:0, 4:0})
ppr2 = nx.pagerank(G,personalization={1:0, 2:1, 3:0, 4:0})
like image 98
Joel Avatar answered Sep 20 '22 23:09

Joel