Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to improve SEO for Serverless Websites?

I want to improve SEO (i.e., correctly index my pages on search engines) in a serverless architecture when my website is hosted on AWS S3.

As I'm using a JavaScript approach to routing (something akin to angular, but simpler) and getting dynamic content to fill metatags, I'm finding everything to be quite troublesome for scrapers without JavaScript support, like Facebook's.

I have default meta-tags already inserted and those are, of course, loaded just fine but I need the updated ones.

I know most people uses pre-rendering on a server or through something like Prerender.io but I really wanted to find an alternative that makes sense on a serverless approach.

I thought I had it figured out since Open Graph metatags allow for a "pointers" URL where you can have a "metatags-only" HTML if needed. So I was thinking of using a Lambda function to generate the HTML response with the right metatags on a GET request. The problem is since the Facebook scraper has no JavaScript support, how can I send the dynamic content on the GET request?

like image 254
João Moreira Avatar asked Nov 20 '16 22:11

João Moreira


1 Answers

If you are using S3, you must prerender the pages before uploading them. You can't call Lambda functions on the fly because the crawler will not execute JavaScript. You can't even use Prerender.io with S3.

Suggestion:

  1. Host your website locally.
  2. Use PhanthomJS to fetch the pages and write a prerendered version.
  3. Upload each page to S3 following the page address*.

* E.g.: the address from example.com/about/us must be mapped as a us.html file inside a folder about in your bucket root.

Now, your users and the crawlers will see the exactly the same pages, without needing JavaScript to load the initial state. The difference is that with JavaScript enabled, your framework (Angular?) will load the JS dependencies (like routes, services, etc.) and take control like a normal SPA application. When the user click to browse another page, the SPA will reload the inner content without making a full page reload.

Pros:

  • Easy to setup.
  • Very fast to serve content. You can also use CloudFront to improve the speed.

Cons:

  • If you have 1000 pages (for e.g.: 1000 products that you sell in your store), you need make 1000 prerendered pages.
  • If your page data changes frequently, you need to prerender frequently.
  • Sometimes the crawler will index old content*.

* The crawler will see the old content, but the user will probably see the current content as the SPA framework will take control of the page and load the inner content again.


You said that you are using S3. If you want to prerender on the fly, you can't use S3. You need to use the following:

Route 53 => CloudFront => API Gateway => Lambda

Configure:
- Set the API Gateway endpoint as the CloudFront origin.
- Use "HTTPS Only" in the "Origin Policy Protocol" (CloudFront).
- The Lambda function must be a proxy.

In this case, your Lambda function will know the requested address and will be able to correctly render the requested HTML page.

Pros:

  • As Lambda has access to the database, the rendered page will always be updated.

Cons:

  • Much slower to load the webpages.
like image 193
Zanon Avatar answered Oct 06 '22 07:10

Zanon