Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to make robots ignore certain text?

I have my blog (you can see it if you want, from my profile), and it's fresh, as well as google robots parsing results are.

The results were alarming to me. Apparently the most common 2 words on my site are "rss" and "feed", because I use text for links like "Comments RSS", "Post Feed", etc. These 2 words will be present in every post, while other words will be more rare.

Is there a way to make these links disappear from Google's parsing? I don't want technical links getting indexed. I only want content, titles, descriptions to get indexed. I am looking for something other than replacing this text with images.

I found some old discussions on Google, back from 2007 (I think in 3 years many things could have changed, hopefully this too)

This question is not about robots.txt and how to make Google ignore pages. It is about making it ignore small parts of the page, or transforming the parts in such a way that it will be seen by humans and invisible to robots.

like image 954
Alex Avatar asked Jul 08 '10 19:07

Alex


People also ask

What does robots txt disallow do?

The “Disallow: /” part means that it applies to your entire website. In effect, this will tell all robots and web crawlers that they are not allowed to access or crawl your site.

How do I exclude a search bot?

Go to the View Settings section under the Admin section in your Google Analytics view, and check the 'Exclude all hits from known bots and spiders' box. You will then start filtering out bot and spider traffic which will make reporting on human visits and activity much clearer.

How does robots txt work?

These bots "crawl" webpages and index the content so that it can show up in search engine results. A robots. txt file helps manage the activities of these web crawlers so that they don't overtax the web server hosting the website, or index pages that aren't meant for public view.


2 Answers

There is a simple way to tell google to not index parts of your documents, that is using googleon and googleoff:

<p>This is normal (X)HTML content that will be indexed by Google.</p>  <!--googleoff: index-->  <p>This (X)HTML content will NOT be indexed by Google.</p>  <!--googleon: index--> 

In this example, the second paragraph will not be indexed by Google. Notice the “index” parameter, which may be set to any of the following:

  • index — content surrounded by “googleoff: index” will not be indexed by Google

    anchor — anchor text for any links within a “googleoff: anchor” area will not be associated with the target page

    snippet — content surrounded by “googleoff: snippet” will not be used to create snippets for search results

    all — content surrounded by “googleoff: all” are treated with all

source

like image 179
Ormoz Avatar answered Oct 14 '22 21:10

Ormoz


Google ignores HTML tags which have data-nosnippet:

<p>    This text can be included in a snippet    <span data-nosnippet>and this part would not be shown</span>. </p> 

Source: Special tags that Google understands - Inline directives

like image 37
Zulu Avatar answered Oct 14 '22 20:10

Zulu