When developing a site in angularJS do you have to worry about web crawlers before you start working on your site, or can you push it off until the site is finished.
I have read that HTML snapshots are a good solution, for instance. If you chose to do this, would you be able to implement it after coding a site, or would you have to create the site based around this kind of functionality.
I think it's good to think about the strategy at the beginning of the project and implement it close to the end of the project.
We got the problem in the company I am working at.
In all cases you will need to answer GET requests to endpoints like
...?_escaped_fragment_=/home
when, say Google or Bing, will crawl the page
...#/home
See offical Google documentation for details.
The question is how you will fill the content of the resource
...?_escaped_fragment_=:path
There are differents strategies :
Generate dynamic snapshots with PhantomJS every time a crawler asks for the resource
This consists in spawning a PhantomJS process at runtime, redirecting the content of the generated HTML page to the output and sending it back to the crawler.
I think this is the most transverse and transparent solution if you website has a lot of dynamic crawlable content.
Generate static snapshots with PhantomJS at build time or when hitting the save button of the CMS of the website
This is good if the content of your crawlable content never changes or just from time to time.
Generate static « equivalent » content files at dev time or when hitting the save button of the CMS of the website
This is a very cheap solution as it does not involve PhantomJS. This is good if the content is simple and if you can easily write it or generate it from a database.
It is difficult to handle if the content is complicated to retrieve as you will need to duplicate your code (one client side to render Angular views, and one serverside to generate the whole page « equivalent » content for crawlers).
I mentioned the PhantomJS solution, but whatever headless (or not if you can afford a display) browser will do the work. You can even imagine being able to render your views server-side without any browser but just running you JS in a NodeJS server for instance.
Also think for the beginning if you will use HTML5 style URLs, or hash, or hashbang URLs. This can be difficult to change once the content is indexed by search engines. I advice hashbang style even if it can be seen as « ugly ».*
My solution to make Application on Angular crawlable by Google. Used in aisel.co
Add rule to your .htaccess
RewriteCond %{QUERY_STRING} ^_escaped_fragment_=(.*)$
RewriteCond %{REQUEST_URI} !^/snapshots/views/ [NC]
RewriteRule ^(.*)/?$ /snapshots/views/%1 [L]
Create node.js script for snapshots, and run it in terminal: node snapshots.js
var htmlSnapshots = require('html-snapshots');
var result = htmlSnapshots.run({
input: "array",
source: [
"http://aisel.dev/#!/",
"http://aisel.dev/#!/contact/",
"http://aisel.dev/#!/page/about-aisel"
],
outputDir: "web/snapshots",
outputDirClean: true,
selector: ".navbar-header",
timeout: 10000
}, function(err, snapshotsCompleted) {
var fs = require('fs');
fs.rename('web/snapshots/#!', 'web/snapshots/views', function(err) {
if ( err ) console.log('ERROR: ' + err);
});
});
Make sure that everything works with curl, type in terminal
curl http://aisel.dev/\?_escaped_fragment_\=/page/about-aisel/ this should show contents of snapshot .../www/aisel.dev/public/web/snapshots/views/page/about-aisel/index.html
Do not about directive for goggle and other crawlers. you app should contain meta rule in head:
<meta name="fragment" content="!">
Full terms from google here: https://developers.google.com/webmasters/ajax-crawling/docs/specification
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With