<p>I have a <code>php</code> page that renders a book of let's say 100 pages. Each page has a specific url (e.g. <code>/my-book/page-one</code>, <code>/my-book/page-two</code> etc).</p> <p>When flipping the pages, I change the url using the history API, using <code>url.js</code>.</p> <p>Since all the book content is rendered from the server side, the problem is that the content is indexed by search engines (especially I'm referring to Google), but the urls are wrong (e.g. it finds a snippet on <code>page-two</code> but the url is <code>page-one</code>).</p> <p>How to stop search engines (at least Google) to index all the content on the page, but index only the <em>visible</em> book page?</p> <p>Would it work if I render the content in a different way: for example, <code><div data-page-number="1" data-content="Lorem ipsum..."></div></code> and then on the JavaScript side to change that in the needed format? That would make the page slower and in fact I'm not sure if Google will not index the changed content by JavaScript.</p> <p>The code looks like this:</p> <pre class="prettyprint"><code><div data-page="1">Page 1</div> <div data-page="2">Page 2</div> <div data-page="3" class="current-page">Page 3</div> <div data-page="4">Page 4</div> <div data-page="5">Page 5</div> </code></pre> <p>Then only visible div is the <code>.current-page</code> one. The same content is served on multiple urls because that's needed so the user can flip between pages.</p> <p>For example, <code>/book/page/3</code> will render this piece of HTML while <code>/book/page/4</code> renders the same thing, the only difference being the <code>current-page</code> class which is added to the 4th element.</p> <p>Google did index different urls, but it did it wrong: for example, the snippet <code>Page 5</code> links to <code>/book/page/2</code> which renders to the user <code>Page 2</code> (not <code>Page 5</code>).</p> <p>How to tell Google (and other search engines) I'm only interested to index the content in the <code>.current-page</code>?</p>

<p>As I understood he issue is that you have same content for many urls. Like:</p> <blockquote> <p>www.my-awesome-domain.com/my-book/page/42 </p> <p>www.my-awesome-domain.com//my-book/page/7</p> </blockquote> <p>And the visible content of the page is adjustable by JavaScript, that User Execute when he clicks some elements on your site. </p> <p>In This case you need to do 2 things:</p> <ol> <li>Mark your URL's as Canonical pages in any of the ways described in this google document: https://support.google.com/webmasters/answer/139066?hl=en </li> <li>You need add a feature that each page will load to the same state after full page refresh, for example you can use hash parameter when navigating as desiribed in the article here: or here is the overview of the technique </li> </ol> <p>Today google bot is executing JavaScript as announced in their official blog: https://webmasters.googleblog.com/2015/10/deprecating-our-ajax-crawling-scheme.html</p> <p>So if you achieve proper page behavior when hitting Refresh (F5) and Will specify the canonical pages property, pages will be correctly crawled, and when you will follow the link you will get to the linked page.</p> <p>If you need more guidance how to do it in <strong>url.js</strong> Please post another question (so it's will be proper documented for others) and I will be glad to help.</p>

<p>The answere is really simple: you can't do it. There is no technical possibility to keep the same content under different URLs and ask search engines to index only part of it.</p> <p>If you are OK with having only one page indexed you can use, as suggested before, canonical URLs. You place the canonical URL that links to the main page on every sub-page.</p> <p>You may find a "hack" that uses special tags used for Google Search Appliance: <code>googleon</code> and <code>googleoff</code>.</p> <p>https://www.google.com/support/enterprise/static/gsa/docs/admin/70/gsa_doc_set/admin_crawl/preparing.html</p> <p>The only issue is this will most likely not work with Google Bot (at least no one will guarantee it will) or any other search engine.</p>

Stop search engines to index specific parts of the page

Tags:

javascript

html

php

seo

I have a php page that renders a book of let's say 100 pages. Each page has a specific url (e.g. /my-book/page-one, /my-book/page-two etc).

When flipping the pages, I change the url using the history API, using url.js.

Since all the book content is rendered from the server side, the problem is that the content is indexed by search engines (especially I'm referring to Google), but the urls are wrong (e.g. it finds a snippet on page-two but the url is page-one).

How to stop search engines (at least Google) to index all the content on the page, but index only the visible book page?

Would it work if I render the content in a different way: for example, <div data-page-number="1" data-content="Lorem ipsum..."></div> and then on the JavaScript side to change that in the needed format? That would make the page slower and in fact I'm not sure if Google will not index the changed content by JavaScript.

The code looks like this:

<div data-page="1">Page 1</div>
<div data-page="2">Page 2</div>
<div data-page="3" class="current-page">Page 3</div>
<div data-page="4">Page 4</div>
<div data-page="5">Page 5</div>

Then only visible div is the .current-page one. The same content is served on multiple urls because that's needed so the user can flip between pages.

For example, /book/page/3 will render this piece of HTML while /book/page/4 renders the same thing, the only difference being the current-page class which is added to the 4th element.

Google did index different urls, but it did it wrong: for example, the snippet Page 5 links to /book/page/2 which renders to the user Page 2 (not Page 5).

How to tell Google (and other search engines) I'm only interested to index the content in the .current-page?

756

asked May 06 '16 09:05

Ionică Bizău

2 Answers

As I understood he issue is that you have same content for many urls. Like:

www.my-awesome-domain.com/my-book/page/42

www.my-awesome-domain.com//my-book/page/7

And the visible content of the page is adjustable by JavaScript, that User Execute when he clicks some elements on your site.

In This case you need to do 2 things:

Mark your URL's as Canonical pages in any of the ways described in this google document: https://support.google.com/webmasters/answer/139066?hl=en
You need add a feature that each page will load to the same state after full page refresh, for example you can use hash parameter when navigating as desiribed in the article here: or here is the overview of the technique

Today google bot is executing JavaScript as announced in their official blog: https://webmasters.googleblog.com/2015/10/deprecating-our-ajax-crawling-scheme.html

So if you achieve proper page behavior when hitting Refresh (F5) and Will specify the canonical pages property, pages will be correctly crawled, and when you will follow the link you will get to the linked page.

If you need more guidance how to do it in url.js Please post another question (so it's will be proper documented for others) and I will be glad to help.

answered Sep 27 '22 19:09

OBender

The answere is really simple: you can't do it. There is no technical possibility to keep the same content under different URLs and ask search engines to index only part of it.

If you are OK with having only one page indexed you can use, as suggested before, canonical URLs. You place the canonical URL that links to the main page on every sub-page.

You may find a "hack" that uses special tags used for Google Search Appliance: googleon and googleoff.

https://www.google.com/support/enterprise/static/gsa/docs/admin/70/gsa_doc_set/admin_crawl/preparing.html

The only issue is this will most likely not work with Google Bot (at least no one will guarantee it will) or any other search engine.

answered Sep 27 '22 19:09

Aleksander Wons

Related questions
                            
                                window.close() doesn't work on iOS
                            
                                Understanding "this" keyword
                            
                                how to encode this data to parent / children structure in JSON
                            
                                HTML5 localStorage getting key from value [closed]
                            
                                How to properly escape attribute values in css/js attribute selector [attr=value]?
                            
                                Angular.js watch only on particular object property
                            
                                Debug javascript in Visual Studio 2012 when using chrome or firefox
                            
                                Phonegap, Cordova - Issue with Plugins
                            
                                foreignobject is not working in IE10
                            
                                How to read serial port data from JavaScript
                            
                                In IE11, How to use console.log?
                            
                                Javascript error: Cannot read property 'parentNode' of null
                            
                                Weird character in front of Highcharts tooltip series names
                            
                                Is it possible to stream an octet stream being generated in javascript?
                            
                                Creating new Angular $resource with default values?
                            
                                AngularJS vs. AppML
                            
                                SVG marker - can I set length and angle?
                            
                                babel 6 async / await: Unexpected token
                            
                                Browserify and Babel gulp tasks
                            
                                Node js capture keyboard press and mouse movement (not on Web Browser)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With