Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Single Page App + Amazon S3 + Amazon CloudFront + Prerender.io - how to set up?

  1. I have single page app built with Backbone.js.
  2. I host app (app consists of static files only) on Amazon S3.
  3. I use CloudFront as a Bucket CDN.
  4. App is accessed by https://myapp.com -> https://abcdefgh34545.cloudfront.com -> https://myBucket.s3-eu-west-1.amazonaws.com/index.html

How I can use Prerender.io service with this stack? I have to somehow detect that WebSpider/WebRobot is accessing the page and redirect it to prerender.io...

like image 854
user606521 Avatar asked Mar 13 '14 15:03

user606521


People also ask

What is the difference between S3 and CloudFront?

Amazon S3 is a Simple Storage Service, this can be used large amount of information i.e. Videos, Images, PDF etc. CloudFront is a Content Delivery Network, which is closer to the end user and is used to make the information available on Amazon S3 in the least possible time.

What is CloudFront how do you set it up?

You create a CloudFront distribution to tell CloudFront where you want content to be delivered from, and the details about how to track and manage content delivery. Then CloudFront uses computers—edge servers—that are close to your viewers to deliver that content quickly when someone wants to see it or use it.


2 Answers

You can use Lambda@Edge to configure CloudFront to send crawler HTTP requests directly to prerender.io.

The basic idea is to have a viewer-request handler which sets a custom HTTP header for requests which should be sent to prerender.io. For example this Lambda@Edge code:

        'use strict';
        /* change the version number below whenever this code is modified */
        exports.handler = (event, context, callback) => {
            const request = event.Records[0].cf.request;
            const headers = request.headers;
            const user_agent = headers['user-agent'];
            const host = headers['host'];
            if (user_agent && host) {
              if (/baiduspider|Facebot|facebookexternalhit|twitterbot|rogerbot|linkedinbot|embedly|quora link preview|showyoubot|outbrain|pinterest|slackbot|vkShare|W3C_Validator/.test(user_agent[0].value)) {
                headers['x-prerender-token'] = [{ key: 'X-Prerender-Token', value: '${PrerenderToken}'}];
                headers['x-prerender-host'] = [{ key: 'X-Prerender-Host', value: host[0].value}];
              }
            }
            callback(null, request);
        };

The cloudfront distribution must be configured to pass through the X-Prerender-Host and X-Prerender-Token headers.

Finally a origin-request handler changes the origin server if X-Prerender-Token is present:

      'use strict';
      /* change the version number below whenever this code is modified */
      exports.handler = (event, context, callback) => {
           const request = event.Records[0].cf.request;
           if (request.headers['x-prerender-token'] && request.headers['x-prerender-host']) {
             request.origin = {
                 custom: {
                     domainName: 'service.prerender.io',
                     port: 443,
                     protocol: 'https',
                     readTimeout: 20,
                     keepaliveTimeout: 5,
                     customHeaders: {},
                     sslProtocols: ['TLSv1', 'TLSv1.1'],
                     path: '/https%3A%2F%2F' + request.headers['x-prerender-host'][0].value
                 }
             };
          }
          callback(null, request);
      };

There's a fully worked example at: https://github.com/jinty/prerender-cloudfront

like image 172
Brian Sutherland Avatar answered Oct 06 '22 09:10

Brian Sutherland


I managed to do this by not using Prerender at all but creating AWS Lambda function that:

  • Requests the origin page from CloudFront (it actually is always the same index.html)
  • Map the lambda function via API Gateway catch-all proxy
  • Study the path and figure out what resource page should be about (in my case it is simply /user/{name}, so I only have to do one use-case
  • Make REST API request to get the dynamic data for the user
  • Regex replace the existing meta-fields with the dynamic ones
  • Return the new index-file with new metas

Configure new origin (new lambda function) and behaviour (map /user/* requests to this new origin). Be sure to use "HTTPS only" Origin Protocol Policy for the origin, as API Gateway is only HTTPS, redirect here will cause the hostname to change.

(If you by accident used the redirect, then you will need to Invalidate "/*" as due to some CloudFront bug the configuration change will not help ; I spent multiple hours debugging this last night)

like image 9
Render Avatar answered Oct 06 '22 08:10

Render