Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

I don't understand the GraphQL N+1 problem

So just yesterday i started learning graphql its really interesting, and quite easy to learn and understand actually. i started reading some articles and i found the N+1 problem. i found this example here

Query

# getting the top 100 reviews
{
  top100Reviews {
    body
    author {
      name
    }
  }
}

Schema


const typeDefs = gql`
  type User {
    id: ID!
    name: String
  }
  type Review {
    id: ID!
    body: String
    author: User
    product: Product
  }
  type Query {
    top100Reviews: [Review]
  }
`;

and finally the resolvers

const resolver = {
  Query: {
    top100Reviews: () => get100Reviews(),
  },
  Review: {
    author: (review) => getUser(review.authorId),
  },
};

in this article he said

When we execute the following query to get the top 100 reviews and the corresponding author names, we first make a single call to retrieve 100 records of review from database and then for each review, we make another call to the database to fetch the user details given the author ID.

cant we just remove the Review from the resolver and just make a simple JOIN (if im in sql) in the get100Reviews method

i dont get it why we did the Review resolver if we gonna have N+1 problem while we can just make simple JOIN in the Query resolver.

Im i understanding GraphQL right ??

Please some one shed some light here, and tell me.

Thanks !!

like image 911
wassimbj Avatar asked Mar 24 '20 13:03

wassimbj


People also ask

What is N 1 problem in GraphQL?

The spamming of backends is often referred to as the N+1 problem, when the application makes N requests instead of 1 to retrieve an object's details or its child entities. Sponsor Note. StepZen enables developers to easily build and deploy a single GraphQL API that gets the data they need from multiple backends.

What is the N 1 problem?

TL;DR: The N+1 query problem happens when your code executes N additional query statements to fetch the same data that could have been retrieved when executing the primary query. If you understood the previous statement, you can skip right to the next section: “How to fix it?”

Why GraphQL is not popular?

GraphQL provides a lot of flexibility on the client side but this means that we cannot optimize as aggressively on the server. Making sure that our GraphQL server is performant requires discipline and care. It's questionable if this investment is justified unless the team is already well versed in GraphQL performance.

Is GraphQL difficult?

GraphQL is more than just making queries The GraphQL stack might appear to be simple to get started with, but it gets complex quickly. It requires a lot of upfront learning and can be intimidating for newcomers to figure out how all these pieces fit together.


2 Answers

You are correct -- using a join would let you make a single database query instead of 101.

The problem is that in practice, you wouldn't just have one join -- your review data model might include associations with any number of other models, each one requiring its own join clause. Not only that, but those models might have relationships to other models themselves. Trying to craft a single SQL query that will account for all possible GraphQL queries becomes not only difficult, but also prohibitively expensive. A client might request only the reviews with none of their associated models, but the query to fetch those reviews now include 30 additional, unnecessary views. That query might have taken less than a second but now takes 10.

Consider also that relationships between types can be circular:

{
  reviews {
    author {
      reviews {
        author
      }
    }
  }
}

In this case, the depth of a query is indeterminate and it is impossible to create a single SQL query that would accommodate any possible GraphQL query.

Using a library like dataloader allows us to alleviate the N+1 problem through batching while keeping any individual SQL query as lean as possible. That said, you'll still end up with multiple queries. An alternative approach is to utilize the GraphQLResolveInfo object passed to the resolver to determine which fields were requested in the first place. Then if you like, you can make only the necessary joins in your query. However, parsing the info object and constructing that sort of query can be a daunting task, especially once you start dealing with deeply nested associations. On the other hand, dataloader is a more simple and intuitive solution.

like image 98
Daniel Rearden Avatar answered Sep 21 '22 21:09

Daniel Rearden


I just wrote a package that I believe can solve N+1 problems in most cases on GraphQL on Nodejs. Check it out! https://github.com/oney/sequelize-proxy

It basically uses data loaders to batch multiple queries to single one but it further leverages features and association definitions in sequelize to make it more accurate and efficient.

like image 22
user2790103 Avatar answered Sep 18 '22 21:09

user2790103