Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Google Not Properly Caching My AJAX Crawlable Application?

I wrote a SPA using durandal 2.0 and im using HTML5 pushstate for my navigation changes. I have setup everything according to googles specifications. I am not including hashabangs /#! in my URLs's, instead I am using the meta fragment tag.

<meta name="fragment" content="!">

I am using a headless browser, phantom JS to serve google bot the fully rendered HTML of my AJAX application. In MVC I am detecting ?_escaped_fragment_= and performing a 302 redirect to the a URL that serves the fully rendered HTML. That part is working fine, to test that, navigate here: https://insureflo.com/?_escaped_fragment_= and you will see the redirect, and the fully rendered HTML content of my site https://insureflo.com.

I have a sitemap, that has all of my URLs in it, including the root. Despite all of this, google will still not cache or crawl my application properly and is still showing the loading page for the app. I was under the impression that you could use pushstate and rely on the meta fragment meta tag for google to parse, and include the escaped_fragment in the URL automatically.

However, fetching as googlebot in the webmaster tools I get the following response:

 HTTP/1.1 302 Found
 Cache-Control: private
 Content-Type: text/html; charset=utf-8
 Location: /HtmlSnapshot?url=https%3A%2F%2Finsureflo.com%2F%23
 Server: Microsoft-IIS/8.0
 X-AspNetMvc-Version: 4.0
 X-AspNet-Version: 4.0.30319
 X-Powered-By: ASP.NET
 Date: Sun, 08 Sep 2013 06:59:28 GMT
 Connection: close
 Content-Length: 168

 <html><head><title>Object moved</title></head><body>
 <h2>Object moved to <a href="/HtmlSnapshot?url=https%3A%2F%2Finsureflo.com%2F%23">here</a>.</h2>
 </body></html>

This 302 is correct I believe according to the specifications, but why isn't it indexing the redirected content and displaying it in both the HTMl view and the image preview of the site?? Also, when viewing the cache in googles search results, I get a blank page, and viewing the source rendered the regular page, not the fully rendered HTML as expected. For example:

http://webcache.googleusercontent.com/search?q=cache:https://insureflo.com

At this point I have read and read the specs and I believe I have met the requirements for crawling an AJAX application, and could really use some help getting this figured out. Am I missing something here? Thank you!

like image 463
ccorrin Avatar asked Sep 10 '13 04:09

ccorrin


1 Answers

There are a number of things I found when trying to do the same thing.

  1. The Fetch as Google in the webmaster tools will tell you that there was a redirect but will not follow it.

  2. For Google Bot to actually follow the redirect during a crawl you must have the site that is being redirected to also in Webmaster tools.

  3. If you want Google to crawl your site from a sitemap and do the _escaped_fragment_ then the links in your sitemap have to be in the format:

    http://yourlink.com/#!/stuff

I have a more detailed write up on my blog at

http://mark.stratmann.me/articles/the-great-ajax-seo-saga

like image 131
Mark Stratmann Avatar answered Sep 28 '22 06:09

Mark Stratmann