Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GraphQL Dataloader vs Mongoose Populate

In order to perform a join-like operation, we can use both GraphQL and Mongoose to achieve that end.

Before asking any question, I would like to give the following example of Task/Activities (none of this code is tested, it is given just for the example's sake):

Task {
  _id,
  title,
  description,
  activities: [{ //Of Activity Type
    _id,
    title
  }]
}

In mongoose, we can retrieve the activities related to a task with the populate method, with something like this:

const task = await TaskModel.findbyId(taskId).populate('activities');

Using GraphQL and Dataloader, we can have the same result with something like:

const DataLoader = require('dataloader');
const getActivitiesByTask = (taskId) => await ActivityModel.find({task: taskId});
const dataloaders = () => ({
    activitiesByTask: new DataLoader(getActivitiesByTask),
});
// ...
// SET The dataloader in the context
// ...

//------------------------------------------
// In another file
const resolvers = {
    Query: {
        Task: (_, { id }) => await TaskModel.findbyId(id),
    },
    Task: {
        activities: (task, _, context) => context.dataloaders.activitiesByTask.load(task._id),
    },
};

I tried to see if there is any article that demonstrates which way is better regarding performance, resource exhaustion,...etc but I failed to find any comparison of the two methods.

Any insight would be helpful, thanks!

like image 484
Strider Avatar asked Oct 05 '18 14:10

Strider


People also ask

What does populate in Mongoose do?

Mongoose has a more powerful alternative called populate() , which lets you reference documents in other collections. Population is the process of automatically replacing the specified paths in the document with document(s) from other collection(s).

Does Mongoose populate use lookup?

Mongoose's populate() method does not use MongoDB's $lookup behind the scenes. It simply makes another query to the database. Mongoose does not have functionalities that MongoDB does not have.

What is GraphQL dataloader?

What are GraphQL DataLoaders? A dataloader is a generic utility used as part of your application's data fetching layer to provide a simplified and consistent API over various remote data sources, such as databases or web services, via batching and caching.


1 Answers

It's important to note that dataloaders are not just an interface for your data models. While dataloaders are touted as a "simplified and consistent API over various remote data sources" -- their main benefit when coupled with GraphQL comes from being able to implement caching and batching within the context of a single request. This sort of functionality is important in APIs that deal with potentially redundant data (think about querying users and each user's friends -- there's a huge chance of refetching the same user multiple times).

On the other hand, mongoose's populate method is really just a way of aggregating multiple MongoDB requests. In that sense, comparing the two is like comparing apples and oranges.

A more fair comparison might be using populate as illustrated in your question as opposed to adding a resolver for activities along the lines of:

activities: (task, _, context) => Activity.find().where('id').in(task.activities)

Either way, the question comes down to whether you load all the data in the parent resolver, or let the resolvers further down do some of the work. because resolvers are only called for fields that are included in the request, there is a potential major impact to performance between these two approaches.

If the activities field is requested, both approaches will make the same number of roundtrips between the server and the database -- the difference in performance will probably be marginal. However, your request might not include the activities field at all. In that case, the activities resolver will never be called and we can save one or more database requests by creating a separate activities resolver and doing the work there.

On a related note...

From what I understand, aggregating queries in MongoDB using something like $lookup is generally less performant than just using populate (some conversation on that point can be found here). In the context of relational databases, however, there's additional considerations to ponder when considering the above approaches. That's because your initial fetch in the parent resolver could be done using joins, which will generally be much faster than making separate db requests. That means at the expense of making the no-activities-field queries slower, you can make the other queries significantly faster.

like image 95
Daniel Rearden Avatar answered Sep 24 '22 00:09

Daniel Rearden