Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pagerank vs SVD

Tags:

Pagerank works on the nodegraph of a series of pages and the directed edges formed by their respective inward and outward links. Thus the rank of a particular page is broadly a locally-induced effect in the nodegraph.

SVD, on the other hand, works on a whole matrix of values, and has no directionality - a link between site A and site B would only register as a 1 on the correct matrix element. It is a global system, so ranking is a global effect.

Given the extreme sparseness of web-derived matrices, I would expect SVD to be a bad performer here, as it requires a complete dataset, and has significant memory requirements.

Is that true? Does Pagerank outdo SVD largely because it is a nodegraph-based algorithm? How can Pagerank infer semantic relevance from a page beyond the number of times a word is mentioned? Or would that be a second step, performed after Pagerank has ranked the pages?

like image 251
Phil H Avatar asked Dec 08 '09 16:12

Phil H


People also ask

Does Google use SVD?

SVD is utilized in many applications such as data analysis and dimensionality reduction, image compression, Google's PageRank algorithm, Netflix's recommender system and many more.

What is SVD used for?

Singular Value Decomposition (SVD) is a widely used technique to decompose a matrix into several component matrices, exposing many of the useful and interesting properties of the original matrix.

What is rank in SVD?

The rank can be thought of as the dimensionality of the vector space spanned by its rows or its columns. Lastly, the rank of A is equal to the number of non-zero singular values! Consider the SVD of a matrix A that has rank k: A = USV.

Does any matrix have an SVD?

Also, singular value decomposition is defined for all matrices (rectangular or square) unlike the more commonly used spectral decomposition in Linear Algebra.


1 Answers

There are two issues here: which measure is easy to compute, and which yields the information that we're looking for? I don't know the answer to either question, but I can perhaps give a partial answer.

Firstly, the relevance. Both quantities are centrality measures, to use a term from network theory. PageRank computes (a variant of) eigenvector centrality, while the SVD apparently leads to the Hyperlink-Induced Topics Search (HITS) algorithm. I got this from this handout from Peter Dodds (University of Vermont). They measure different things, but it's not clear to me which one is the most relevant for measuring the importance of a webpage.

Secondly, the computational cost. Mathematically speaking, the PageRank is the dominant eigenvector of the (modified) adjacency matrix - as explained on the Wikipedia page - while HITS gives the dominant singular vector of the adjacency matrix. Both are defined by the global network of webpages and links between them, and both can be computed by only considering the node graph locally. So at first sight, I think that the computational cost is roughly equal.

In conclusion, I don't know why PageRank is better than SVD; it's not even clear to me that it is better than SVD.

like image 84
Jitse Niesen Avatar answered Oct 11 '22 09:10

Jitse Niesen