<p>I have very common GraphQL schema like this (pseudocode): </p> <pre class="prettyprint"><code>Post { commentsPage(skip: Int, limit: Int) { total: Int items: [Comment] } } </code></pre> <p>So to avoid n+1 problem when requesting multiple <code>Post</code> objects I decided to use Facebook's Dataloader.</p> <p>Since I'm working on Nest.JS 3-tier layered application (Resolver-Service-Repository), I have question: </p> <p>should I wrap my repository methods with DataLoader or should I wrap my service methods with Dataloder? </p> <p>Below is example of my service method that returns <code>Comments</code> page (i.e. this method called from <code>commentsPage</code> property resolver). Inside service method I'm using 2 repository methods (<code>#count</code> and <code>#find</code>):</p> <pre class="prettyprint"><code>@Injectable() export class CommentsService { constructor( private readonly repository: CommentsRepository, ) {} async getCommentsPage(postId, dataStart, dateEnd, skip, limit): PaginatedComments { const counts = await this.repository.getCount(postId, dateStart, dateEnd); const itemsDocs = await this.repository.find(postId, dateStart, dateEnd, skip, limit); const items = this.mapDbResultToGraphQlType(itemsDocs); return new PaginatedComments(total, items) } } </code></pre> <p>So should I create individual instances of Dataloader for each of repository method (<code>#count</code>, <code>#find</code> etc) or should I just wrap my entire service method with Dataloader (so my <code>commentsPage</code> property resolver will just work with Dataloader not with service)? </p>

<p><em>Disclaimer:</em> I am not an expert in Nest.js but I have written a good bunch of dataloaders as well as worked with automatically generated dataloaders. I hope I can give a bit of insight nonetheless.</p> <h3>What is the actual problem?</h3> <p>While your question seems to be a relatively simple either or question it is probably much more difficult than that. I think the actual problem is the following: Whether to use the dataloader pattern or not for a specific field needs to be decided on a per field basis. The repository+service pattern on the other hand tries to abstract away this decision by exposing abstract and powerful ways of data access. One way out would be to simply "dataloaderify" every method of your service. Unfortunately in practice this is not really feasable. Let's explore why!</p> <h3>Dataloader is made for key-value-lookups</h3> <p>Dataloader provides a promise cache to reduce dublicated calls to the database. For this cache to work all requests need to be simple key value lookups (e.g. <code>userByIdLoader</code>, <code>postsByUserIdLoader</code>). This quickly becomes not sufficient enough, like in one of your example your request to the repository has a lot of parameters:</p> <pre class="prettyprint lang-js prettyprint-override"><code>this.repository.find(postId, dateStart, dateEnd, skip, limit); </code></pre> <p>Sure technically you could make <code>{ postId, dateStart, dateEnd, skip, limit }</code> your key and then somehow hash the content to generate a unique key.</p> <h3>Writing Dataloader queries is an order of magnitude harder than normal queries</h3> <p>When you implement a dataloader query it now suddenly has to work for a list of the inputs the initial query needed. Here a simple SQL example:</p> <pre class="prettyprint lang-sql prettyprint-override"><code>SELECT * FROM user WHERE id = ? -- Dataloaded SELECT * FROM user WHERE id IN ? </code></pre> <p>Okay now the repository example from above:</p> <pre class="prettyprint lang-sql prettyprint-override"><code>SELECT * FROM comment WHERE post_id = ? AND date < ? AND date > ? OFFSET ? LIMIT ? -- Dataloaded ??? </code></pre> <p>I have sometimes written queries that work for two parameters and they already become very difficult problems. This is why most dataloaders are simply <em>load by id</em> lookups. This tread on twitter discusses how a GraphQL API should only expose what can be efficiently queried. If you create service methods with strong filter methods you have the same problem even if your GraphQL API does not expose these filters.</p> <h3>Okay so what is the solution?</h3> <p>The first thing to my understanding that Facebook does is match fields and service methods very closely. You could do the same. This way you can make a decision in the service method if you want to use a dataloader or not. For example I don't use dataloaders in root queries (e.g. <code>{ getPosts(filter: { createdBefore: "...", user: 234 }) { .. }</code>) but in subfields of types that appear in lists <code>{ getAllPosts { comments { ... } }</code>. The root query is not going to be executed in a loop and is therefore not exposed to the n+1 problem.</p> <p>Your repository now exposes what can be "efficiently queried" (as in Lee's tweet) like <em>foreign/primary key lookups</em> or <em>filtered find all</em> queries. The service can then wrap for example the key lookups in a dataloader. Often I end up filtering small lists in my business logic. I think this is perfectly fine for small apps but might be problematic when you scale. The GraphQL Relay helpers for JavaScript do something similar when you use the <code>connectionFromArray</code> function. The pagination is not done on the database level and this is probably okay for 90% of connections.</p> <h3>Some sources to consider</h3> <ul> <li>GraphQL before GraphQL - Dan Schafer</li> <li>Dataloader source code walkthrough - Lee Byron</li> <li>There is another talk from this years GraphQL conf that discusses the data access at FB but I don't think it is uploaded yet. I might come back when it has been published.</li> </ul>

Should GraphQL DataLoader wrap request to database or wrap requests to service methods?

Tags:

graphql

nestjs

dataloader

I have very common GraphQL schema like this (pseudocode):

Post {
  commentsPage(skip: Int, limit: Int) {
    total: Int
    items: [Comment]
  }
}

So to avoid n+1 problem when requesting multiple Post objects I decided to use Facebook's Dataloader.

Since I'm working on Nest.JS 3-tier layered application (Resolver-Service-Repository), I have question:

should I wrap my repository methods with DataLoader or should I wrap my service methods with Dataloder?

Below is example of my service method that returns Comments page (i.e. this method called from commentsPage property resolver). Inside service method I'm using 2 repository methods (#count and #find):

@Injectable()
export class CommentsService {
    constructor(
        private readonly repository: CommentsRepository,
    ) {}

    async getCommentsPage(postId, dataStart, dateEnd, skip, limit): PaginatedComments {
        const counts = await this.repository.getCount(postId, dateStart, dateEnd);
        const itemsDocs = await this.repository.find(postId, dateStart, dateEnd, skip, limit);
        const items = this.mapDbResultToGraphQlType(itemsDocs);
        return new PaginatedComments(total, items)
    }
}

So should I create individual instances of Dataloader for each of repository method (#count, #find etc) or should I just wrap my entire service method with Dataloader (so my commentsPage property resolver will just work with Dataloader not with service)?

383

asked Jul 25 '19 13:07

WelcomeTo

1 Answers

Disclaimer: I am not an expert in Nest.js but I have written a good bunch of dataloaders as well as worked with automatically generated dataloaders. I hope I can give a bit of insight nonetheless.

What is the actual problem?

While your question seems to be a relatively simple either or question it is probably much more difficult than that. I think the actual problem is the following: Whether to use the dataloader pattern or not for a specific field needs to be decided on a per field basis. The repository+service pattern on the other hand tries to abstract away this decision by exposing abstract and powerful ways of data access. One way out would be to simply "dataloaderify" every method of your service. Unfortunately in practice this is not really feasable. Let's explore why!

Dataloader is made for key-value-lookups

Dataloader provides a promise cache to reduce dublicated calls to the database. For this cache to work all requests need to be simple key value lookups (e.g. userByIdLoader, postsByUserIdLoader). This quickly becomes not sufficient enough, like in one of your example your request to the repository has a lot of parameters:

this.repository.find(postId, dateStart, dateEnd, skip, limit);

Sure technically you could make { postId, dateStart, dateEnd, skip, limit } your key and then somehow hash the content to generate a unique key.

Writing Dataloader queries is an order of magnitude harder than normal queries

When you implement a dataloader query it now suddenly has to work for a list of the inputs the initial query needed. Here a simple SQL example:

SELECT * FROM user WHERE id = ?
-- Dataloaded
SELECT * FROM user WHERE id IN ?

Okay now the repository example from above:

SELECT * FROM comment WHERE post_id = ? AND date < ? AND date > ? OFFSET ? LIMIT ?
-- Dataloaded
???

I have sometimes written queries that work for two parameters and they already become very difficult problems. This is why most dataloaders are simply load by id lookups. This tread on twitter discusses how a GraphQL API should only expose what can be efficiently queried. If you create service methods with strong filter methods you have the same problem even if your GraphQL API does not expose these filters.

Okay so what is the solution?

The first thing to my understanding that Facebook does is match fields and service methods very closely. You could do the same. This way you can make a decision in the service method if you want to use a dataloader or not. For example I don't use dataloaders in root queries (e.g. { getPosts(filter: { createdBefore: "...", user: 234 }) { .. }) but in subfields of types that appear in lists { getAllPosts { comments { ... } }. The root query is not going to be executed in a loop and is therefore not exposed to the n+1 problem.

Your repository now exposes what can be "efficiently queried" (as in Lee's tweet) like foreign/primary key lookups or filtered find all queries. The service can then wrap for example the key lookups in a dataloader. Often I end up filtering small lists in my business logic. I think this is perfectly fine for small apps but might be problematic when you scale. The GraphQL Relay helpers for JavaScript do something similar when you use the connectionFromArray function. The pagination is not done on the database level and this is probably okay for 90% of connections.

Some sources to consider

GraphQL before GraphQL - Dan Schafer
Dataloader source code walkthrough - Lee Byron
There is another talk from this years GraphQL conf that discusses the data access at FB but I don't think it is uploaded yet. I might come back when it has been published.

137

answered Oct 20 '22 04:10

Herku

Related questions
                            
                                GraphQL Subscriptions: Max Listeners Exceeded Warning
                            
                                How to implement isTypeOf method?
                            
                                What is the proper way to unit test Service with NestJS/Elastic
                            
                                Gatsby and Graphql - How to filter allMarkdownRemark by folder
                            
                                GraphQL Args error: argument type must be Input Type but got: function GraphQLObjectType(config) {
                            
                                Node Fetch Post Request using Graphql Query
                            
                                GraphQL syntax to access file by relativepath
                            
                                NestJS - Expected undefined to be a GraphQL schema
                            
                                Apollo mutation debounce and race conditions
                            
                                How to wire data to a deep component in react-router-relay?
                            
                                Apollo subscriptions - handling WS disconnects with subscribeToMore
                            
                                How to fix `Warning: Text content did not match. Server: "Some Data" Client: "Loading..."`
                            
                                GraphQL - return calculated type dependent on argument
                            
                                Is graphql's ID type necessary if I've set an unique identifier with dataIdFromObject in Apollo Client
                            
                                Apollo Server timeout while waiting for stream data
                            
                                AWS-Amplify API module: how to make GraphQL fields unique?
                            
                                GraphQL query to access first item in an array?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With