Purely JavaScript Solution for Google Ajax Crawlable Spec

Question

I have a project which is heavily JavaScript based (e.g. node.js, backbone.js, etc.). I'm using hashbang urls like /#!/about and have read the google ajax crawlable spec. I've done a wee bit of headless UI testing with zombie and can easily conceive of how this could be done by setting a slight delay and returning static content back to the google bot. But I don't really want to implement this from scratch and was hoping there was a pre-existing library that fits in with my stack. Know of one?

EDIT: At time of writing I don't think this exists. However, rendering using backbone (or similar) on server and client is a plausible approach (even if not a direct answer). So I'm going to mark that as answer although there may be better solutions in the future.

jsdw · Accepted Answer

Just to chime in, I ran into this issue too (I have very ajax/js heavy site), and I found this which may be of interest:

crawlme

I have yet to try it but it sounds like it will make the whole process a piece of cake if it works as advertised! it's a piece of connect/express middleware that is simply inserted before any calls to pages, and apparently takes care of the rest.

Edit:

Having tried crawlme, I had some success, but the backend headless browser it uses (zombie.js) was failing with some of my javascript content, likely because it works by emulting the DOM and thus won't be perfect.

Sooo, instead I got hold of a full webkit based headless browser, phantomjs, and a set of node linkings for it, like this:

npm install phantomjs node-phantom

I then created my own script similar to crawlme, but using phantomjs instead of zombie.js. This approach seems to work perfectly, and will render every single one of my ajax based pages perfectly. the script I wrote to pull this off can be found here. to use it, simply:

var googlebot = require("./path-to-file");

and then before any other calls to your app (this is using express but should work with just connect too:

app.use(googlebot());

the source is realtively simple minus a couple of regexps, so have a gander :)

Result: AJAX heavy node.js/connect/express based website can be crawled by the googlebot.

opengrid · Answer

There is one implementation using node.js and Backbone.js on the server and browser https://github.com/Morriz/backbone-everywhere

Guillaume Berche · Answer

crawleable nodejs module seems to fit this purpose: https://npmjs.org/package/crawlable and example of such SPA that can be rendered server-side in node https://github.com/trupin/crawlable-todos

Purely JavaScript Solution for Google Ajax Crawlable Spec

Tags:

ajax

node.js

web-crawler

zombie.js

Rob

3 Answers

jsdw

opengrid

Guillaume Berche

Recent Activity

Donate For Us

Purely JavaScript Solution for Google Ajax Crawlable Spec

Tags:

ajax

node.js

web-crawler

zombie.js

Rob

3 Answers

jsdw

opengrid

Guillaume Berche

Related questions

Recent Activity

Donate For Us