Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

.htaccess for SEO bots crawling single page applications without hashbangs

Using a pushState enabled page, normally you redirect SEO bots using the escaped_fragment convention. You can read more about that here.

The convention assumes that you will be using a (#!) hashbang prefix before all of your URI's on a single page application. SEO bots will escape these fragments by replacing the hashbang with it's own recognizable convention escaped_fragment when making a page request.

//Your page
http://example.com/#!home

//Requested by bots as
http://example.com/?_escaped_fragment=home

This allows the site administrator to detect bots, and redirect them to a cached prerendered page.

RewriteCond %{QUERY_STRING} ^_escaped_fragment_=(.*)$
RewriteRule ^(.*)$  https://s3.amazonaws.com/mybucket/$1 [P,QSA,L]

The problem is that the hashbang is getting phased out quickly with the widely adapted pushState support. It's also really ugly and isn't very intuitive to a user.

So what if we used HTML5 mode where pushState guides the entire user application?

//Your index is using pushState
http://example.com/

//Your category is using pushState (not a folder)
http://example.com/category

//Your category/subcategory is using pushState
http://example.com/category/subcategory

Can rewrite rules guide bots to your cached version using this newer convention? Related but only accounts for index edge case. Google also has an article that suggests using an opt-in method for this single edge case using <meta name="fragment" content="!"> in the <head> of the page. Again, this is for a single edge case. Here we are talking about handling every page as an opt-in senario.

http://example.com/?escaped_fragment=
http://example.com/category?escaped_fragment=
http://example.com/category/subcategory?escaped_fragment=

I'm thinking that the escaped_fragment could still be used as an identifier for SEO bots, and that I could extract everything inbetween the the domain and this identifier to append to my bucket location like:

RewriteCond %{QUERY_STRING} ^_escaped_fragment_=$
# (high level example I have no idea how to do this)
# extract "category/subcategory" == $2
# from http://example.com/category/subcategory?escaped_fragment=
RewriteRule ^(.*)$  https://s3.amazonaws.com/mybucket/$2 [P,QSA,L]

What's the best way to handle this?

like image 225
Dan Kanze Avatar asked Jul 29 '13 14:07

Dan Kanze


1 Answers

Had a similar problem on a single page web app.

The only solution I found to this problem was effectively creating static versions of pages for the purpose of making something navigable by the Google (and other) bots.

You could do this yourself, but there are also services that do exactly this and create your static cache for you (and serve up the snapshots to the bots over their CDN).

I ended up using SEO4Ajax, although other similar services are available!

like image 114
Matt McDonald Avatar answered Sep 28 '22 09:09

Matt McDonald