Pagerank works on the nodegraph of a series of pages and the directed edges formed by their respective inward and outward links. Thus the rank of a particular page is broadly a locally-induced effect in the nodegraph.
SVD, on the other hand, works on a whole matrix of values, and has no directionality - a link between site A and site B would only register as a 1 on the correct matrix element. It is a global system, so ranking is a global effect.
Given the extreme sparseness of web-derived matrices, I would expect SVD to be a bad performer here, as it requires a complete dataset, and has significant memory requirements.
Is that true? Does Pagerank outdo SVD largely because it is a nodegraph-based algorithm? How can Pagerank infer semantic relevance from a page beyond the number of times a word is mentioned? Or would that be a second step, performed after Pagerank has ranked the pages?
SVD is utilized in many applications such as data analysis and dimensionality reduction, image compression, Google's PageRank algorithm, Netflix's recommender system and many more.
Singular Value Decomposition (SVD) is a widely used technique to decompose a matrix into several component matrices, exposing many of the useful and interesting properties of the original matrix.
The rank can be thought of as the dimensionality of the vector space spanned by its rows or its columns. Lastly, the rank of A is equal to the number of non-zero singular values! Consider the SVD of a matrix A that has rank k: A = USV.
Also, singular value decomposition is defined for all matrices (rectangular or square) unlike the more commonly used spectral decomposition in Linear Algebra.
There are two issues here: which measure is easy to compute, and which yields the information that we're looking for? I don't know the answer to either question, but I can perhaps give a partial answer.
Firstly, the relevance. Both quantities are centrality measures, to use a term from network theory. PageRank computes (a variant of) eigenvector centrality, while the SVD apparently leads to the Hyperlink-Induced Topics Search (HITS) algorithm. I got this from this handout from Peter Dodds (University of Vermont). They measure different things, but it's not clear to me which one is the most relevant for measuring the importance of a webpage.
Secondly, the computational cost. Mathematically speaking, the PageRank is the dominant eigenvector of the (modified) adjacency matrix - as explained on the Wikipedia page - while HITS gives the dominant singular vector of the adjacency matrix. Both are defined by the global network of webpages and links between them, and both can be computed by only considering the node graph locally. So at first sight, I think that the computational cost is roughly equal.
In conclusion, I don't know why PageRank is better than SVD; it's not even clear to me that it is better than SVD.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With