Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to tell google bot to skip part of HTML?

Tags:

html

css

seo

There is much info about opposite situation, when people try to have stuff in HTML, that is visible to Google bots, but not visible to users, in my case, I need opposite thing - to hide some of the html from google bot. The question would be how?

Flash is not an answer,
Would prefer not to use fancy ajax things also (mainly because I need it right away, not on document ready),
Also robots.txt is not an answer, because it works on urls, not parts of the page. Would any special css/simple javascript work, is any special html tag for this?

like image 525
Giedrius Avatar asked Jan 11 '12 14:01

Giedrius


People also ask

How do I avoid Googlebot?

You can prevent a page or other resource from appearing in Google Search by including a noindex meta tag or header in the HTTP response. When Googlebot next crawls that page and sees the tag or header, Google will drop that page entirely from Google Search results, regardless of whether other sites link to it.

Can Googlebot crawl my site?

Googlebot can crawl the first 15MB of an HTML file or supported text-based file. Any resources referenced in the HTML such as images, videos, CSS, and JavaScript are fetched separately. After the first 15MB of the file, Googlebot stops crawling and only considers the first 15MB of the file for indexing.

How do I stop websites from crawling?

You can block access in the following ways: To prevent your site from appearing in Google News, block access to Googlebot-News using a robots. txt file. To prevent your site from appearing in Google News and Google Search, block access to Googlebot using a robots.

What do Google bots look for?

Googlebot uses an algorithmic process to determine which sites to crawl, how often, and how many pages to fetch from each site. Google's crawlers are also programmed such that they try not to crawl the site too fast to avoid overloading it.


2 Answers

Maybe a base64 encoding server side and then decoding on the client side could work?

Code:

<!-- visible to Google -->
<p> Hi, Google Bot! </p>

<!-- not visible from here on -->
<script type="text/javascript">
document.write ("<?php echo base64_encode('<b>hey there, user</b>'); ?>");
</script>

How it looks to the bot:

<!-- visible to Google -->
<p> Hi, Google Bot! </p>

<!-- not visible from here on -->
<script type="text/javascript">
document.write (base64_decode("B9A985350099BC8913=="));
</script>
like image 100
mishmash Avatar answered Sep 21 '22 13:09

mishmash


Create a Div, Load the content of the Div (ajax) from an html file which resides in a directory protected by robots. Example. /index.html

Somewhere on the header. (check http://api.jquery.com/jQuery.ajax/ )

$.ajax({
  url: '/hiddendirfrombots/test.html',
  success: function(data) {
    $('#hiddenfrombots').html(data);
  }
});

... somewhere in the body

<div id="hiddenfrombots"></div>

create a directory "hiddenfrombots" and put the followin in the roots .htaccess

User-agent: *
Disallow: /hiddenfrombots/ 
like image 21
Al-Punk Avatar answered Sep 21 '22 13:09

Al-Punk