Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Making AJAX Applications crawlable without backend control

I've built a tool that leverages EmberJS and GitHub Pages to create a blogging application that is rendered in-browser. It uses JavaScript to fetch Markdown files and render them into the body of the application. Because all content is fetched via AJAX requests, I'm not sure of the best way to make the content crawlable by Google, etc.

I've read many articles that suggest using PhantomJS to handle the _escaped_fragment_ requests, but since the content is hosted on GitHub, there's no way to run anything server-side.

Is there a possible work-around for this (such as rendering something ahead-of-time before pushing content to GitHub) or am I just experiencing the shortcomings of JavaScript applications?

like image 943
hodgesmr Avatar asked Aug 09 '13 20:08

hodgesmr


3 Answers

The question is, Can googlebot do basic javascript?

If not, then, no. As I read you, your app requires JS support to render any page. This leaves you without a bot-friendly access method.

If yes, then, yes:

Because JavaScript can access the url parameters via location.search, you can create plausible URLs for Google to fetch in href attributes which are interpreted by your JS app, and overridden for users in onclick attributes.

<a href="/?a=My-Blog-Post" onclick="someFunc(this.href);return false;">

This would be paired with code in your app's onload to seek location.search and fetch which .md may appear in the designated url parameter (after you parse the query string) in hopes that Google is running said onload to get the specified content. This is a variant of many sites' domain.com/#!ajax/path style pathing. Both are completely client side, but the query string variant will indicate to googlebot that the page is worth fetching as a distinct URL.

You may be able to test this with http://google.com/webmasters, which has a "fetch as googlebot" feature.

like image 121
Umbrella Avatar answered Nov 14 '22 03:11

Umbrella


I created a small module that helps it. 's a look at http://alexferreira.github.io/seojs/

like image 1
alexferreira Avatar answered Nov 14 '22 03:11

alexferreira


Without a backend server doing some logic it makes it a bit tricky...

But maybe, inspired by what is talked about here http://meta.discourse.org/t/seo-compared-to-other-well-known-tools/3914 and http://eviltrout.com/2013/06/19/adding-support-for-search-engines-to-your-javascript-applications.html

You could use your build script to generate copies of your index file in a tree following your routes definition post/:post_slug like /post/slug/index.html. Each page would have a <noscript> tag with very basic content and links of the current post. You could even have your CurrentPost JSON hash preloaded in the page to save some XHR.

That means using the History API which is not very IE friendly, but maybe not a big issue.

like image 1
colymba Avatar answered Nov 14 '22 03:11

colymba