Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the least redundant way to make a site with JavaScript-generated HTML crawlable?

After reading Google's policy on making Ajax-generated content crawlable, along with many developers' blog posts and Stackoverflow Q&A threads on the subject, I'm left with the conclusion that there is no way to make a site with only JavaScript/Ajax-generated HTML crawlable. A site I'm currently working isn't getting a fair amount of its content indexed. All of the presentation layer for our non-indexed content is built in JavaScript by generating HTML from JSON returned from Ajax-based web service calls, and we believe Google is not indexing the content because of that. Is that correct?

The only solution seems to be to also have a "fall-back" version of the site for search engines (specifically Google) where all the HTML and content would be generated as it traditionally has been, on the server-side. For clients with JavaScript enabled, it seems that we could use essentially the same approach that we do now: using JavaScript to generate HTML from asynchronously loaded JSON.

Reading around, my understanding is that the current best practice for applying the DRY principle in creating crawlable Ajax-generated websites as described above is to use a templating engine that can use the same templates on the client-side and the server-side. For clients with JavaScript enabled, the client-side templating engine, for example mustache.js, would transform JSON data sent from the server into HTML as defined by its copy of a template file. And for search crawlers and clients with JavaScript disabled, the server-side implementation of the same templating engine, for example mustache.java, would similarly operate on its copy of the same exact template file to output HTML.

If that solution is correct, then how is this different than approaches used 4 or 5 years ago by front-end heavy sites, where sites essentially had to maintain two copies of the templating code, one copy for users with JavaScript enabled (nearly everyone) and another copy (e.g. in FreeMarker or Velocity) for search engines and browsers without JavaScript enabled (nearly noone)? It seems like there should be a better way.

Does this imply that two templating model layers would need to be maintained, one on the client-side and one on the server-side? How advisable is it to combine those client-side templates with a front-end MVC (MV/MVVC) framework like Backbone.js, Ember.js, or YUI App Library? How do these solutions affect maintenance costs? Would it be better to try doing this without introducing more frameworks -- a new templating engine and a front-end MVC framework -- into a development team's technology stack? Is there a way to do this less redundantly?

If that solution isn't correct, then is there something we're missing and could be doing better with our JavaScript to keep our existing asynchronous HTML-from-JSON structure and get it indexed, so we don’t need to introduce something new to the architecture stack? We really rather wouldn't have to update two versions of the presentation layer when the business needs change.

like image 900
jqp Avatar asked Apr 18 '12 15:04

jqp


1 Answers

I think a combination of a few technologies and one manually coded hack which you could reuse would fix you right. Here's my crazy, half baked idea. It's theoretical and probably not complete. Step 1:

  • Use client side templates, like you suggest. Put every template in a separate file (so that you can reuse them easily between the client and the server)
  • Use underscore.js templating, or reconfigure Mustache. This way you'll get ERB style delimiters in your template, identical to Java's <%= %> stuff.
  • Since they're separate files, you'll want to start developing in CommonJS modules with a module loader like curl.js or require.js to load the templates in your client side code. If you aren't doing modular development, it's pretty awesome. I started ~a month ago. Seems hard at first but it's just a different way to wrap your code: http://addyosmani.com/writing-modular-js/

Ok, so now you have isolated templates. Now we just need to figure out how to build a flat page out of them on the server. I only see two approaches. Step 2:

  • You could annotate your JS so that the server can read it and see a default path for ajax calls and what templates they link to then the server can use the annotations to call the controller methods in the right order and fill out a flat page.
  • Or you could annotate your templates to indicate which controller they should call and provide example call params. This would be easy to maintain and would benefit front end devs like me who have to look up controller URLs all the time. It would also tell your back end code what to call.

Hope this helps. Curious to hear the best answer to this. An interesting problem.

like image 89
SimplGy Avatar answered Nov 03 '22 10:11

SimplGy