I have a php
page that renders a book of let's say 100 pages. Each page has a specific url (e.g. /my-book/page-one
, /my-book/page-two
etc).
When flipping the pages, I change the url using the history API, using url.js
.
Since all the book content is rendered from the server side, the problem is that the content is indexed by search engines (especially I'm referring to Google), but the urls are wrong (e.g. it finds a snippet on page-two
but the url is page-one
).
How to stop search engines (at least Google) to index all the content on the page, but index only the visible book page?
Would it work if I render the content in a different way: for example, <div data-page-number="1" data-content="Lorem ipsum..."></div>
and then on the JavaScript side to change that in the needed format? That would make the page slower and in fact I'm not sure if Google will not index the changed content by JavaScript.
The code looks like this:
<div data-page="1">Page 1</div>
<div data-page="2">Page 2</div>
<div data-page="3" class="current-page">Page 3</div>
<div data-page="4">Page 4</div>
<div data-page="5">Page 5</div>
Then only visible div is the .current-page
one. The same content is served on multiple urls because that's needed so the user can flip between pages.
For example, /book/page/3
will render this piece of HTML while /book/page/4
renders the same thing, the only difference being the current-page
class which is added to the 4th element.
Google did index different urls, but it did it wrong: for example, the snippet Page 5
links to /book/page/2
which renders to the user Page 2
(not Page 5
).
How to tell Google (and other search engines) I'm only interested to index the content in the .current-page
?
You can prevent a page or other resource from appearing in Google Search by including a noindex meta tag or header in the HTTP response. When Googlebot next crawls that page and sees the tag or header, Google will drop that page entirely from Google Search results, regardless of whether other sites link to it.
Indexing is the process by which search engines organise information before a search to enable super-fast responses to queries. Searching through individual pages for keywords and topics would be a very slow process for search engines to identify relevant information.
If you do not remove the tag, your page will not be indexed or searchable via search engines. Block a single outgoing link. To hide a single link on a page, embed a rel tag within the <a href> </a> link tag. You may wish to use this tag to block links on other pages that lead to the specific page you want to block.
You can prevent new content from appearing in results by adding the URL slug to a robots. txt file. Search engines use these files to understand how to index a website's content. If search engines have already indexed your content, you can add a "noindex" meta tag to the content's head HTML.
As I understood he issue is that you have same content for many urls. Like:
www.my-awesome-domain.com/my-book/page/42
www.my-awesome-domain.com//my-book/page/7
And the visible content of the page is adjustable by JavaScript, that User Execute when he clicks some elements on your site.
In This case you need to do 2 things:
Today google bot is executing JavaScript as announced in their official blog: https://webmasters.googleblog.com/2015/10/deprecating-our-ajax-crawling-scheme.html
So if you achieve proper page behavior when hitting Refresh (F5) and Will specify the canonical pages property, pages will be correctly crawled, and when you will follow the link you will get to the linked page.
If you need more guidance how to do it in url.js Please post another question (so it's will be proper documented for others) and I will be glad to help.
The answere is really simple: you can't do it. There is no technical possibility to keep the same content under different URLs and ask search engines to index only part of it.
If you are OK with having only one page indexed you can use, as suggested before, canonical URLs. You place the canonical URL that links to the main page on every sub-page.
You may find a "hack" that uses special tags used for Google Search Appliance: googleon
and googleoff
.
https://www.google.com/support/enterprise/static/gsa/docs/admin/70/gsa_doc_set/admin_crawl/preparing.html
The only issue is this will most likely not work with Google Bot (at least no one will guarantee it will) or any other search engine.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With