I'm making a site which will have reviews of the privacy policies of hundreds of thousands of other sites on the internet. Its initial content is based on my running through the CommonCrawl 5 billion page web dump and analyzing all the privacy policies with a script, to identify certain characteristics (e.g. "Sells your personal info").
According to the SEO MOZ Beginner's Guide to SEO:
Search engines tend to only crawl about 100 links on any given page. This loose restriction is necessary to keep down on spam and conserve rankings.
I was wondering what would be a smart way to create a web of navigation that leaves no page orphaned, but would still avoid this SEO penalty they speak of. I have a few ideas:
Wikipedia and StackOverflow have obviously solved this problem very well by allowing users to categorize or tag all of the pages. In my case I don't have that luxury, but I want to find the best option available.
At the core of this question is how Google responds to different navigation structures. Does it penalize those who create a web of pages in a programmatic/meaningless way? Or does it not care so long as everything is connected via links?
Google PageRank does not penalize you for having >100 links on a page. But each link above a certain threshold decreases in value/importance in the PageRank algorithm.
Quoting SEOMOZ and Matt Cutts:
Could You Be Penalized?
Before we dig in too deep, I want to make it clear that the 100-link limit has never been a penalty situation. In an August 2007 interview, Rand quotes Matt Cutts as saying:
The "keep the number of links to under 100" is in the technical guideline section, not the quality guidelines section. That means we're not going to remove a page if you have 101 or 102 links on the page. Think of this more as a rule of thumb.
At the time, it's likely that Google started ignoring links after a certain point, but at worst this kept those post-100 links from passing PageRank. The page itself wasn't going to be de-indexed or penalized.
So the question really is how to get Google to take all your links seriously. You accomplish this by generating a XML sitemap for Google to crawl (you can either have a static sitemap.xml file, or its content can be dynamically generated). You will want to read up on the About Sitemaps section of the Google Webmaster Tools help documents.
Just like having too many links on a page is an issue,having too many links in a XML sitemap file is also an issue. What you need to do is paginate your XML sitemap. Jeff Atwood talks about how StackOverflow implements this: The Importance of Sitemaps. Jeff also discusses the same issue on StackOverflow podcast #24.
Also, this concept applies to Bing as well.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With