Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does pageranking algorithm deal with webpage without outbound links?

I am learning about the PageRanking algorithm so sorry for some newbie questions. I understand that the PR value is calculated for each page by the summation of incoming links to itself.

Now I am bothered by a statement which stated that "the PageRank values sum to one " at wikipedia.

As the example shown at wikipedia, if every page has a outbound link, then the summation of whole probabilities from each page should be one. However, if a page does not have any outbound link such as page A at the example, then the summation should not be value 1 right ?

Thus, does Pagerank algorithm have to assume that every page has at least one outbound link ? Could someone elaborate more how Pageranking deal with pages without any incoming or outbound links ? How will the formulas change accordingly ? Thanks

like image 352
Cassie Avatar asked Feb 02 '14 05:02

Cassie


People also ask

How does the PageRank algorithm work?

PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites.

Do outbound links help with SEO?

As well as the fact that unnatural outbound links can harm your website's performance. But some SEOs believe that outbound links can have a more significant impact than most would find, and this study from Reboot Online claims that "external links remain a ranking factor and good SEO best practice."

What is a PageRank for a webpage?

PageRank is an algorithm designed by Google to help assess the authority of a Web page and website overall (1). One of several quality metrics Google uses to determine the authority of a website, PageRank assigns a score between 0 and 10 to a website to determine its relative value to users.

Why PageRank explain PageRank algorithm with example?

PageRank (PR) is an algorithm used by Google Search to rank websites in their search engine results. PageRank was named after Larry Page, one of the founders of Google. PageRank is a way of measuring the importance of website pages.


1 Answers

As page-rank is described in the original article, and in the wikipedia article, it is indeed not defined when out-degree(v)=0 for some v, since you get P(v,u)=d/n+(1-d)*0/0 - which is undefined

A node that has no outgoing edge is called a dangling node and there are basically 3 common ways to take care of them:

  1. Eliminate such nodes from the graph (and repeat the process iteratively until there are no dangling nodes.
  2. Consider those pages to link back to the pages that linked to them (i.e. - for each edge (u,v), if out-degree(v) = 0, regard (v,u) as an edge).
  3. Link the dangling node to all pages (including itself usually), and effectively make the probability for random jump from this node 1.

About a page with no incoming node - that shouldn't be an issue because everything is perfectly defined. Such a node will have a page rank of exactly d/n - because you can only get to it by random surfing from any node - and that's the probability to be in it.

Hope that answered your question!

like image 58
amit Avatar answered Sep 19 '22 23:09

amit