I'm using DataLoader for batching the requests/queries together.
In my loader function I need to know the requested fields to avoid having a SELECT * FROM query
but rather a SELECT field1, field2, ... FROM query
...
What would be the best approach using DataLoader to pass down the resolveInfo
needed for it? (I use resolveInfo.fieldNodes
to get the requested fields)
At the moment, I'm doing something like this:
await someDataLoader.load({ ids, args, context, info });
and then in the actual loaderFn:
const loadFn = async options => {
const ids = [];
let args;
let context;
let info;
options.forEach(a => {
ids.push(a.ids);
if (!args && !context && !info) {
args = a.args;
context = a.context;
info = a.info;
}
});
return Promise.resolve(await new DataProvider().get({ ...args, ids}, context, info));};
but as you can see, it's hacky and doesn't really feel good...
Does anyone have an idea how I could achieve this?
I am not sure if there is a good answer to this question simply because Dataloader is not made for this usecase but I have worked extensively with Dataloader, written similar implementations and explored similar concepts on other programming languages.
Let's understand why Dataloader is not made for this usecase and how we could still make it work (roughly like in your example).
Dataloader is made for simple key-value-lookups. That means given a key like an ID it will load a value behind it. For that it assumes that the object behind the ID will always be the same until it is invalidated. This is the single assumption that enables the power of dataloader. Without it the three key features of Dataloader won't work anymore:
This leads us to the following two important rules if we want to maximise the power of Dataloader:
Two different entities cannot share the same key, othewise we might return the wrong entity. This sounds trivial but it is not in your example. Let's say we want to load a user with ID 1
and the fields id
and name
. A little bit later (or at the same time) we want to load user with ID 1
and fields id
and email
. These are technically two different entities and they need to have a different key.
The same entity should have the same key all the time. Again sounds trivial but really is not in the example. User with ID 1
and fields id
and name
should be the same as user with ID 1
and fields name
and id
(notice the order).
In short a key needs to have all the information needed to uniquely identify an entity but not more than that.
await someDataLoader.load({ ids, args, context, info });
In your question you have provided a few more things to your Dataloader as a key. First I would not put in args and context into the key. Does your entity change when the context changes (e.g. you are querying a different database now)? Probably yes, but do you want to account for that in your dataloader implementation? I would instead suggest to create new dataloaders for each request as described in the docs.
Should the whole request info be in the key? No, but we need the fields that are requested. Apart from that your provided implementation is wrong and would break when the loader is called with two different resolve infos. You only set the resolve info from the first call but really it might be different on each object (think about the first user example above). Ultimately we could arrive at the following implementation of a dataloader:
// This function creates unique cache keys for different selected
// fields
function cacheKeyFn({ id, fields }) {
const sortedFields = [...(new Set(fields))].sort().join(';');
return `${id}[${sortedFields}]`;
}
function createLoaders(db) {
const userLoader = new Dataloader(async keys => {
// Create a set with all requested fields
const fields = keys.reduce((acc, key) => {
key.fields.forEach(field => acc.add(field));
return acc;
}, new Set());
// Get all our ids for the DB query
const ids = keys.map(key => key.id);
// Please be aware of possible SQL injection, don't copy + paste
const result = await db.query(`
SELECT
${fields.entries().join()}
FROM
user
WHERE
id IN (${ids.join()})
`);
}, { cacheKeyFn });
return { userLoader };
}
// now in a resolver
resolve(parent, args, ctx, info) {
// https://www.npmjs.com/package/graphql-fields
return ctx.userLoader.load({ id: args.id, fields: Object.keys(graphqlFields(info)) });
}
This is a solid implementation but it has a few weaknesses. First, we are overfetching a lot of fields if we have different field requiements in the same batch request. Second, if we have fetched an entity with key 1[id,name]
from cache key function we could also answer (at least in JavaScript) keys 1[id]
and 1[name]
with that object. Here we could build a custom map implementation that we could supply to Dataloader. It would be smart enough to know these things about our cache.
We see that this is really a complicated matter. I know it is often listed as a benefit of GraphQL that you don't have to fetch all fields from a database for every query, but the truth is that in practice this is seldomly worth the hassle. Don't optimise what is not slow. And even is it slow, is it a bottleneck?
My suggestion is: Write trivial Dataloaders that simply fetch all (needed) fields. If you have one client it is very likely that for most entities the client fetches all fields anyways, otherwise they would not be part of you API, right? Then use something like query introsprection to measure slow queries and then find out which field exactly is slow. Then you optimise only the slow thing (see for example my answer here that optimises a single use case). And if you are a big ecomerce platform please don't use Dataloader for this. Build something smarter and don't use JavaScript.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With