Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the idiomatic, performant way to resolve related objects?

How do you write query resolvers in GraphQL that perform well against a relational database?

Using the example schema from this tutorial, let's say I have a simple database with users and stories. Users can author multiple stories but stories only have one user as their author (for simplicity).

When querying for a user, one might also want to get a list of all stories authored by that user. One possible definition a GraphQL query to handle that (stolen from the above linked tutorial):

const Query = new GraphQLObjectType({
  name: 'Query',
  fields: () => ({
    user: {
      type: User,
      args: {
        id: {
          type: new GraphQLNonNull(GraphQLID)
        }
      },
      resolve(parent, {id}, {db}) {
        return db.get(`
          SELECT * FROM User WHERE id = $id
          `, {$id: id});
      }
    },
  })
});

const User = new GraphQLObjectType({
  name: 'User',
  fields: () => ({
    id: {
      type: GraphQLID
    },
    name: {
      type: GraphQLString
    },
    stories: {
      type: new GraphQLList(Story),
      resolve(parent, args, {db}) {
        return db.all(`
          SELECT * FROM Story WHERE author = $user
        `, {$user: parent.id});
      }
    }
  })
});

This will work as expected; if I query a specific user, I'll be able to get that user's stories as well if needed. However, this does not perform ideally. It requires two trips to the database, when a single query with a JOIN would have sufficed. The problem is amplified if I query multiple users -- every additional user will result in an additional database query. The problem gets worse exponentially the deeper I traverse my object relationships.

Has this problem been solved? Is there a way to write a query resolver that won't result in inefficient SQL queries being generated?

like image 690
ean5533 Avatar asked Mar 02 '16 20:03

ean5533


1 Answers

There are two approaches to this kind of problem.

One approach, that is used by Facebook, is to enqueue requests happening in one tick and combine them together before sending. This way instead of doing a request for each user, you can do one request to retrieve information about several users. Dan Schafer wrote a good comment explaining this approach. Facebook released Dataloader, which is an example implementation of this technique.

// Pass this to graphql-js context
const storyLoader = new DataLoader((authorIds) => {
  return db.all(
    `SELECT * FROM Story WHERE author IN (${authorIds.join(',')})`
  ).then((rows) => {
    // Order rows so they match orde of authorIds
    const result = {};
    for (const row of rows) {
      const existing = result[row.author] || [];
      existing.push(row);
      result[row.author] = existing;
    }
    const array = [];
    for (const author of authorIds) {
      array.push(result[author] || []);
    }
    return array;
  });
});

// Then use dataloader in your type
const User = new GraphQLObjectType({
  name: 'User',
  fields: () => ({
    id: {
      type: GraphQLID
    },
    name: {
      type: GraphQLString
    },
    stories: {
      type: new GraphQLList(Story),
      resolve(parent, args, {rootValue: {storyLoader}}) {
        return storyLoader.load(parent.id);
      }
    }
  })
});

While this doesn't resolve to efficient SQL, it still might be good enough for many use cases and will make stuff run faster. It's also a good approach for non-relational databases that don't allow JOINs.

Another approach is to use the information about requested fields in the resolve function to use JOIN when it is relevant. Resolve context has fieldASTs field which has parsed AST of the currently resolved query part. By looking through the children of that AST (selectionSet), we can predict whether we need a join. A very simplified and clunky example:

const User = new GraphQLObjectType({
  name: 'User',
  fields: () => ({
    id: {
      type: GraphQLID
    },
    name: {
      type: GraphQLString
    },
    stories: {
      type: new GraphQLList(Story),
      resolve(parent, args, {rootValue: {storyLoader}}) {
        // if stories were pre-fetched use that
        if (parent.stories) {
          return parent.stories;
        } else {
          // otherwise request them normally
          return db.all(`
            SELECT * FROM Story WHERE author = $user
         `, {$user: parent.id});
        }
      }
    }
  })
});

const Query = new GraphQLObjectType({
  name: 'Query',
  fields: () => ({
    user: {
      type: User,
      args: {
        id: {
          type: new GraphQLNonNull(GraphQLID)
        }
      },
      resolve(parent, {id}, {rootValue: {db}, fieldASTs}) {
        // find names of all child fields
        const childFields = fieldASTs[0].selectionSet.selections.map(
          (set) => set.name.value
        );
        if (childFields.includes('stories')) {
          // use join to optimize
          return db.all(`
            SELECT * FROM User INNER JOIN Story ON User.id = Story.author WHERE User.id = $id
          `, {$id: id}).then((rows) => {
            if (rows.length > 0) {
              return {
                id: rows[0].author,
                name: rows[0].name,
                stories: rows
              };
            } else {
              return db.get(`
                SELECT * FROM User WHERE id = $id
                `, {$id: id}
              );
            }
          });
        } else {
          return db.get(`
            SELECT * FROM User WHERE id = $id
            `, {$id: id}
          );
        }
      }
    },
  })
});

Note that this could have problem with, eg, fragments. However one can handle them too, it's just a matter of inspecting the selection set in more detail.

There is currently a PR in graphql-js repository, which will allow writing more complex logic for query optimization, by providing a 'resolve plan' in the context.

like image 151
freiksenet Avatar answered Nov 04 '22 10:11

freiksenet