How do you write query resolvers in GraphQL that perform well against a relational database?
Using the example schema from this tutorial, let's say I have a simple database with users
and stories
. Users can author multiple stories but stories only have one user as their author (for simplicity).
When querying for a user, one might also want to get a list of all stories authored by that user. One possible definition a GraphQL query to handle that (stolen from the above linked tutorial):
const Query = new GraphQLObjectType({
name: 'Query',
fields: () => ({
user: {
type: User,
args: {
id: {
type: new GraphQLNonNull(GraphQLID)
}
},
resolve(parent, {id}, {db}) {
return db.get(`
SELECT * FROM User WHERE id = $id
`, {$id: id});
}
},
})
});
const User = new GraphQLObjectType({
name: 'User',
fields: () => ({
id: {
type: GraphQLID
},
name: {
type: GraphQLString
},
stories: {
type: new GraphQLList(Story),
resolve(parent, args, {db}) {
return db.all(`
SELECT * FROM Story WHERE author = $user
`, {$user: parent.id});
}
}
})
});
This will work as expected; if I query a specific user, I'll be able to get that user's stories as well if needed. However, this does not perform ideally. It requires two trips to the database, when a single query with a JOIN
would have sufficed. The problem is amplified if I query multiple users -- every additional user will result in an additional database query. The problem gets worse exponentially the deeper I traverse my object relationships.
Has this problem been solved? Is there a way to write a query resolver that won't result in inefficient SQL queries being generated?
There are two approaches to this kind of problem.
One approach, that is used by Facebook, is to enqueue requests happening in one tick and combine them together before sending. This way instead of doing a request for each user, you can do one request to retrieve information about several users. Dan Schafer wrote a good comment explaining this approach. Facebook released Dataloader, which is an example implementation of this technique.
// Pass this to graphql-js context
const storyLoader = new DataLoader((authorIds) => {
return db.all(
`SELECT * FROM Story WHERE author IN (${authorIds.join(',')})`
).then((rows) => {
// Order rows so they match orde of authorIds
const result = {};
for (const row of rows) {
const existing = result[row.author] || [];
existing.push(row);
result[row.author] = existing;
}
const array = [];
for (const author of authorIds) {
array.push(result[author] || []);
}
return array;
});
});
// Then use dataloader in your type
const User = new GraphQLObjectType({
name: 'User',
fields: () => ({
id: {
type: GraphQLID
},
name: {
type: GraphQLString
},
stories: {
type: new GraphQLList(Story),
resolve(parent, args, {rootValue: {storyLoader}}) {
return storyLoader.load(parent.id);
}
}
})
});
While this doesn't resolve to efficient SQL, it still might be good enough for many use cases and will make stuff run faster. It's also a good approach for non-relational databases that don't allow JOINs.
Another approach is to use the information about requested fields in the resolve function to use JOIN when it is relevant. Resolve context has fieldASTs
field which has parsed AST of the currently resolved query part. By looking through the children of that AST (selectionSet), we can predict whether we need a join. A very simplified and clunky example:
const User = new GraphQLObjectType({
name: 'User',
fields: () => ({
id: {
type: GraphQLID
},
name: {
type: GraphQLString
},
stories: {
type: new GraphQLList(Story),
resolve(parent, args, {rootValue: {storyLoader}}) {
// if stories were pre-fetched use that
if (parent.stories) {
return parent.stories;
} else {
// otherwise request them normally
return db.all(`
SELECT * FROM Story WHERE author = $user
`, {$user: parent.id});
}
}
}
})
});
const Query = new GraphQLObjectType({
name: 'Query',
fields: () => ({
user: {
type: User,
args: {
id: {
type: new GraphQLNonNull(GraphQLID)
}
},
resolve(parent, {id}, {rootValue: {db}, fieldASTs}) {
// find names of all child fields
const childFields = fieldASTs[0].selectionSet.selections.map(
(set) => set.name.value
);
if (childFields.includes('stories')) {
// use join to optimize
return db.all(`
SELECT * FROM User INNER JOIN Story ON User.id = Story.author WHERE User.id = $id
`, {$id: id}).then((rows) => {
if (rows.length > 0) {
return {
id: rows[0].author,
name: rows[0].name,
stories: rows
};
} else {
return db.get(`
SELECT * FROM User WHERE id = $id
`, {$id: id}
);
}
});
} else {
return db.get(`
SELECT * FROM User WHERE id = $id
`, {$id: id}
);
}
}
},
})
});
Note that this could have problem with, eg, fragments. However one can handle them too, it's just a matter of inspecting the selection set in more detail.
There is currently a PR in graphql-js repository, which will allow writing more complex logic for query optimization, by providing a 'resolve plan' in the context.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With