I have very common GraphQL schema like this (pseudocode):
Post {
commentsPage(skip: Int, limit: Int) {
total: Int
items: [Comment]
}
}
So to avoid n+1 problem when requesting multiple Post
objects I decided to use Facebook's Dataloader.
Since I'm working on Nest.JS 3-tier layered application (Resolver-Service-Repository), I have question:
should I wrap my repository methods with DataLoader or should I wrap my service methods with Dataloder?
Below is example of my service method that returns Comments
page (i.e. this method called from commentsPage
property resolver). Inside service method I'm using 2 repository methods (#count
and #find
):
@Injectable()
export class CommentsService {
constructor(
private readonly repository: CommentsRepository,
) {}
async getCommentsPage(postId, dataStart, dateEnd, skip, limit): PaginatedComments {
const counts = await this.repository.getCount(postId, dateStart, dateEnd);
const itemsDocs = await this.repository.find(postId, dateStart, dateEnd, skip, limit);
const items = this.mapDbResultToGraphQlType(itemsDocs);
return new PaginatedComments(total, items)
}
}
So should I create individual instances of Dataloader for each of repository method (#count
, #find
etc) or should I just wrap my entire service method with Dataloader (so my commentsPage
property resolver will just work with Dataloader not with service)?
The n+1 problem means that the GraphQL server executes many unnecessary round trips to fetch nested data. If you check the service log then you will see that rating Data Fetcher is called sequentially for each and every call to book.
GraphQL is a query language for your API, and a server-side runtime for executing queries by using a type system you define for your data. GraphQL isn't tied to any specific database or storage engine and is instead backed by your existing code and data. You can connect any database using GraphQL.
Is GraphQL a Database Technology? No. GraphQL is often confused with being a database technology. This is a misconception, GraphQL is a query language for APIs - not databases.
Disclaimer: I am not an expert in Nest.js but I have written a good bunch of dataloaders as well as worked with automatically generated dataloaders. I hope I can give a bit of insight nonetheless.
While your question seems to be a relatively simple either or question it is probably much more difficult than that. I think the actual problem is the following: Whether to use the dataloader pattern or not for a specific field needs to be decided on a per field basis. The repository+service pattern on the other hand tries to abstract away this decision by exposing abstract and powerful ways of data access. One way out would be to simply "dataloaderify" every method of your service. Unfortunately in practice this is not really feasable. Let's explore why!
Dataloader provides a promise cache to reduce dublicated calls to the database. For this cache to work all requests need to be simple key value lookups (e.g. userByIdLoader
, postsByUserIdLoader
). This quickly becomes not sufficient enough, like in one of your example your request to the repository has a lot of parameters:
this.repository.find(postId, dateStart, dateEnd, skip, limit);
Sure technically you could make { postId, dateStart, dateEnd, skip, limit }
your key and then somehow hash the content to generate a unique key.
When you implement a dataloader query it now suddenly has to work for a list of the inputs the initial query needed. Here a simple SQL example:
SELECT * FROM user WHERE id = ?
-- Dataloaded
SELECT * FROM user WHERE id IN ?
Okay now the repository example from above:
SELECT * FROM comment WHERE post_id = ? AND date < ? AND date > ? OFFSET ? LIMIT ?
-- Dataloaded
???
I have sometimes written queries that work for two parameters and they already become very difficult problems. This is why most dataloaders are simply load by id lookups. This tread on twitter discusses how a GraphQL API should only expose what can be efficiently queried. If you create service methods with strong filter methods you have the same problem even if your GraphQL API does not expose these filters.
The first thing to my understanding that Facebook does is match fields and service methods very closely. You could do the same. This way you can make a decision in the service method if you want to use a dataloader or not. For example I don't use dataloaders in root queries (e.g. { getPosts(filter: { createdBefore: "...", user: 234 }) { .. }
) but in subfields of types that appear in lists { getAllPosts { comments { ... } }
. The root query is not going to be executed in a loop and is therefore not exposed to the n+1 problem.
Your repository now exposes what can be "efficiently queried" (as in Lee's tweet) like foreign/primary key lookups or filtered find all queries. The service can then wrap for example the key lookups in a dataloader. Often I end up filtering small lists in my business logic. I think this is perfectly fine for small apps but might be problematic when you scale. The GraphQL Relay helpers for JavaScript do something similar when you use the connectionFromArray
function. The pagination is not done on the database level and this is probably okay for 90% of connections.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With