Using GraphQL and relational databases

Question

I've been using GraphQL for about 2 years now. I've been using relational dbs for a long, long time. What I don't understand is the intended way to use the two together. The summary of my problem is essentially how GQL allows you to ask for what you want, and how that is supposed to translate to your API & DB operations.

Suppose you have a set of user tables. Let's imagine there are many, many tables.

|             User                |
id | firstname | lastname | age 
-----------------------------------
1  | Billy     | Bones    | 23


|             UserAddress             |
id | userid | country | street | postal
-----------------------------------------
1  | 1      | USA     | 123 str| 90120


|             UserAccount              |
id | userid | email   | password | date
-----------------------------------------
1  | 1      | [email protected] | 12345    | 1614...

Now you have a GQL resolver

class UserResolver {
  // Query
  getUser(id){
    return database.getUser(id)
  }

  // Field resolver
  address(userId){
    return database.getUserAddress(userId)
  }

  // Field resolver
  account(userId){
    return database.getUserAccount(userId)
  }

  // More field resolvers...
}

I love this design because you can ask for what you want. But ultimately, for database and even just API performance, it would be terrible. What you'd really want is for getUser to return a left join on all user tables, then use GQL naturally to select what fields you want, right? But what if in 90% of cases, all your frontend or consumers ever care about is the user's first and last name? I'm struggling to understand how to maintain usability with GQL and performance with SQL (or even non sql).

This is an extremely simplified example. There would be cases where on more complex objects and tables, the joins may be more involved and the data requested from a consumer may be spread across a few tables. It's so easy to just write one REST point dedicated to that query, but here it feels like we are trying to support "everything" which would be so easy with field resolvers that just make new DB requests, but somehow be performant at the same time.

Michel Floyd · Accepted Answer

Transforming a SQL data model into GQL (GraphQL) starts with creating types - queries come later.

GQL types may or may not map 1:1 with SQL tables, there's a tremendous amount of flexibility in how one models the world in GQL.

In your example you could start with a unified User type:

type User {
  id: ID!
  firstName: String!
  lastName: String!
  age: Int
  email: String
  streetAddress: String
  state: String
  postalCode: String
}

Note first that the password field is not exposed in the User type because there is no use case where the user's password should ever be exposed to the front end. (Also btw you don't want to store the password in your db ever)

Now that I have a type, I can start writing queries. Let's start with getUser - for this query an id is required and a User type is returned:

type Query {
  getUser(id: ID!): User
}

Your resolver code for this query would be simply:

const getUser = (_,{ id} ) => db('User').where({ id }).first()

The specifics of your query syntax will depend on your DB and ORM.

Now you're going to say But that's only going to return the name and age fields, what about the rest of them? That's where field resolvers come into play.

export const User = {
  email: ({ id }) => db('UserAccount').where({ userId: id }).first()?.email
  streetAddress: ({ id }) => db('UserAddress').where({ userId: id }).first()?.street
  state: ({ id }) => db('UserAddress').where({ userId: id }).first()?.state
  postalCode: ({ id }) => db('UserAddress').where({ userId: id }).first()?.postal
}

Now on the client let's say you want to get several of these fields. You write a query:

query myQuery($userId: ID!) {
  getUser(id: $userId) {
    id
    firstName
    lastName
    postalCode
  }
}

When you execute this query:

The query will be validated - if you make mistakes in your query formatting, spelling, or ask for non-existent fields you'll typically get a 400 error.
The server will execute the getUser query and put the result into the parent argument of the User field resolvers
The postalCode resolver will run a second query using the id of the parent as the userId to look for in the UserAddress table.

In this case since we didn't ask for the email address the UserAccount table is never queried. Notice that we didn't run a single JOIN anywhere, instead we did consecutive searches on related tables. We could also have joined the 3 tables in the getUser query and returned all the fields into the parent object and then selected from them depending on our preference but that's your choice in how you implement the SQL->GQL mapping.

If your concerned about performance and scalability consider that facebook, Netflix, Twitter, and many many other global-scale sites use GraphQL every day to query exabyte-scale distributed databases with 10,000s of tables.

I've written a series of posts GraphQL for SQL Developers on this topic which you might find helpful.

Herku · Answer

First, I would consider the following: The benefit of GraphQL is not necessarily to remove overfetching or request waterfalls, it is about moving them to the server. You already get most of the performance benefits because the network between your application and your database is probably very fast, they might even run on the same computer. And let me assure you, that this is usually fine. Before you optimise from here, consider measuring first - the number one rule of performance optimisation.

Then, most companies do one of the following two optimisations:

Dataloader

Dataloader is a utility that allows you to implicitly batch requests. So, in your case, if you call database.getUserAddress five times, it will result in a single query:

SELECT * FROM "UserAddress" WHERE userid IN (1, 2, 3, 4, 5)

For data fetching, dataloader often performs nearly as good as joins (as mentioned above a roundtrip to the database is likely very cheap).

Selection set

It is possible to get the list of selected fields from a selection set (in your case User). This list can be used to make selections and joins on demand. See my answer here: How to get requested fields inside GraphQL resolver?

Combining the two approaches is rather tricky. You can read more in my response here: Passing down arguments using Facebook's DataLoader

Conclusion

I would say that in the end you are already doing better than dogmatic REST:

GET /users
GET /user/1
GET /user/2
GET /user/3
GET /user/4
GET /user/1/address
GET /user/2/address
GET /user/3/address
GET /user/4/address
GET /account/1
GET /account/2
GET /account/3
GET /account/4

Now you could argue that you can just write the /users endpoint to return all this data with some neat joins for you and execute a single SQL statement. I think that if you are in the situation, where this works (e.g. you just have one client, all team members write frontend and backend code, etc.), then you probably don't need GraphQL. Check out trpc or something for type safety or use React server components or Next.js/Remix or whatever is new and shiny right now.

Using GraphQL and relational databases

Tags:

graphql

radop33392

2 Answers

Michel Floyd

Dataloader

Selection set

Conclusion

Herku

Recent Activity

Donate For Us

Using GraphQL and relational databases

Tags:

graphql

radop33392

2 Answers

Michel Floyd

Dataloader

Selection set

Conclusion

Herku

Related questions

Recent Activity

Donate For Us