Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cloudfront cache with GraphQL?

At my company we're using graphql for production apps, but only for private ressources.

For now our public APIs are REST APIs with a Cloudfront service for cache. We want to transform them as GraphQL APIs, but the question is : how to handle cache properly with GraphQL ?

We thought using a GET graphql endpoint, and cache on querystring but we are a bit affraid of the size of the URL requested (as we support IE9+ and sell to schools with sometime really dummy proxy and firewalls)

So we would like to use POST graphQL endpoint but...cloudfront cannot cache a request based on its body

Anyone has an idea / best practice to share ? Thanks

like image 536
Jonathan Banon Avatar asked May 04 '17 07:05

Jonathan Banon


People also ask

Can GraphQL be cached?

Parameters involved in GraphQL caching Client side caches at the browser level use HTTP caching to avoid refetching data that is still fresh. This is set by response headers by the server for the request. GraphQL Clients cache data with an in-memory cache.

How do you cache a GraphQL response?

Caching Query Operations In order to cache a GraphQL execution result (response) we need to build an identifier based on the input that can be used to identify whether a response can be served from the cache or must be executed and then stored within the cache.

Can CloudFront cache API response?

Cloudfront will cache the json response based on your cache control headers. However, you'll likely have to deal with cross domain issue if your single page app is not served from api.yourdomain.com. Cloudfront supports OPTIONS request which means it should be able to support CORS.


3 Answers

The two best options today are:

  • Use a specialized caching solution, like FastQL.io
  • Use persisted queries with GET, where some queries are saved on your server and accessed by name via GET

*Full disclosure: I started FastQL after running into these issues without a good solution.

like image 115
Zach Hobbs Avatar answered Oct 11 '22 05:10

Zach Hobbs


I am not sure if it has a specific name, but I've seen a pattern in the wild where the graphQL queries themselves are hosted on the backend with a specific id. It's much less flexible as it required pre-defined queries baked in.

The client would just send arguments/params and ID of said pre-defined query to use and that would be your cache key. Similar to how HTTP caching would work with an authenticated request to /my-profile with CloudFront serving different responses based on auth token in headers.

How the client sends it depends on your backends implementation of graphQL. You could either pass it as a white listed header or query string.

So if the backend has defined a query that looks like

(Using pseudo code)

const MyQuery = gql`
query HeroNameAndFriends($episode: int) {
  hero(episode: $episode) {
    name
    friends {
      name
    }
  }
}
`

Then your request would be to something like api.app.com/graphQL/MyQuery?episode=3.

That being said, have you actually measured that your queries wouldn't fit in a GET request? I'd say go with GET requests if CDN Caching is what you need and use the approach mentioned above for the requests that don't fit the limits.

Edit: Seems it has a name: Automatic Persisted Queries. https://www.apollographql.com/docs/apollo-server/performance/apq/

Another alternative to remain with POST requests is to use Lambda@Edge on your CloudFront and by using DynamoDB tables to store your caches similar to how CloudFlare workers do it.

async function handleRequest(event) {
    let cache = caches.default
    let response = await cache.match(event.request)
    
    if (!response){
      response = await fetch(event.request)
      if (response.ok) {
        event.waitUntil(cache.put(event.request, response.clone()))
      }
    }
          
    return response
}

Some reading material on that

  • https://aws.amazon.com/blogs/networking-and-content-delivery/lambdaedge-design-best-practices/
  • https://aws.amazon.com/blogs/networking-and-content-delivery/leveraging-external-data-in-lambdaedge/
like image 37
Seivan Avatar answered Oct 11 '22 04:10

Seivan


An option I've explored on paper but not yet implemented is to use Lambda@Edge in request trigger mode to transform a client POST to a GET, which can then result in a cache hit.

This way clients can still use POST to send GQL requests, and you're working with a small number of controlled services within AWS when trying to work out the max URL length for the converted GET request (and these limits are generally quite high).

There will still be a length limit, but once you have 16kB+ GQL requests, it's probably time to take the other suggestion of using predefined queries on server and just reference them by name.

It does have the disadvantage that request trigger Lambdas run on every request, even a cache hit, so will generate some cost, although the lambda itself should be very fast/simple.

like image 1
Adrian Baker Avatar answered Oct 11 '22 03:10

Adrian Baker