Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Passing down arguments using Facebook's DataLoader

I'm using DataLoader for batching the requests/queries together. In my loader function I need to know the requested fields to avoid having a SELECT * FROM query but rather a SELECT field1, field2, ... FROM query...

What would be the best approach using DataLoader to pass down the resolveInfo needed for it? (I use resolveInfo.fieldNodes to get the requested fields)

At the moment, I'm doing something like this:

await someDataLoader.load({ ids, args, context, info });

and then in the actual loaderFn:

const loadFn = async options => {
const ids = [];
let args;
let context;
let info;
options.forEach(a => {
    ids.push(a.ids);
    if (!args && !context && !info) {
        args = a.args;
        context = a.context;
        info = a.info;
    }
});

return Promise.resolve(await new DataProvider().get({ ...args, ids}, context, info));};

but as you can see, it's hacky and doesn't really feel good...

Does anyone have an idea how I could achieve this?

like image 799
Simon Avatar asked Sep 21 '17 19:09

Simon


1 Answers

I am not sure if there is a good answer to this question simply because Dataloader is not made for this usecase but I have worked extensively with Dataloader, written similar implementations and explored similar concepts on other programming languages.

Let's understand why Dataloader is not made for this usecase and how we could still make it work (roughly like in your example).

Dataloader is not made for fetching a subset of fields

Dataloader is made for simple key-value-lookups. That means given a key like an ID it will load a value behind it. For that it assumes that the object behind the ID will always be the same until it is invalidated. This is the single assumption that enables the power of dataloader. Without it the three key features of Dataloader won't work anymore:

  1. Batching requests (multiple requests are done together in one query)
  2. Deduplication (requests to the same key twice result in one query)
  3. Caching (consecutive requests of the same key don't result in multiple queries)

This leads us to the following two important rules if we want to maximise the power of Dataloader:

Two different entities cannot share the same key, othewise we might return the wrong entity. This sounds trivial but it is not in your example. Let's say we want to load a user with ID 1 and the fields id and name. A little bit later (or at the same time) we want to load user with ID 1 and fields id and email. These are technically two different entities and they need to have a different key.

The same entity should have the same key all the time. Again sounds trivial but really is not in the example. User with ID 1 and fields id and name should be the same as user with ID 1 and fields name and id (notice the order).

In short a key needs to have all the information needed to uniquely identify an entity but not more than that.

So how do we pass down fields to Dataloader

await someDataLoader.load({ ids, args, context, info });

In your question you have provided a few more things to your Dataloader as a key. First I would not put in args and context into the key. Does your entity change when the context changes (e.g. you are querying a different database now)? Probably yes, but do you want to account for that in your dataloader implementation? I would instead suggest to create new dataloaders for each request as described in the docs.

Should the whole request info be in the key? No, but we need the fields that are requested. Apart from that your provided implementation is wrong and would break when the loader is called with two different resolve infos. You only set the resolve info from the first call but really it might be different on each object (think about the first user example above). Ultimately we could arrive at the following implementation of a dataloader:

// This function creates unique cache keys for different selected
// fields
function cacheKeyFn({ id, fields }) {
  const sortedFields = [...(new Set(fields))].sort().join(';');
  return `${id}[${sortedFields}]`;
}

function createLoaders(db) {
  const userLoader = new Dataloader(async keys => {
    // Create a set with all requested fields
    const fields = keys.reduce((acc, key) => {
      key.fields.forEach(field => acc.add(field));
      return acc;
    }, new Set());
    // Get all our ids for the DB query
    const ids = keys.map(key => key.id);
    // Please be aware of possible SQL injection, don't copy + paste
    const result = await db.query(`
      SELECT
        ${fields.entries().join()}
      FROM
        user
      WHERE
        id IN (${ids.join()})
    `);
  }, { cacheKeyFn });

  return { userLoader };
}

// now in a resolver
resolve(parent, args, ctx, info) {
  // https://www.npmjs.com/package/graphql-fields
  return ctx.userLoader.load({ id: args.id, fields: Object.keys(graphqlFields(info)) });
}

This is a solid implementation but it has a few weaknesses. First, we are overfetching a lot of fields if we have different field requiements in the same batch request. Second, if we have fetched an entity with key 1[id,name] from cache key function we could also answer (at least in JavaScript) keys 1[id] and 1[name] with that object. Here we could build a custom map implementation that we could supply to Dataloader. It would be smart enough to know these things about our cache.

Conclusion

We see that this is really a complicated matter. I know it is often listed as a benefit of GraphQL that you don't have to fetch all fields from a database for every query, but the truth is that in practice this is seldomly worth the hassle. Don't optimise what is not slow. And even is it slow, is it a bottleneck?

My suggestion is: Write trivial Dataloaders that simply fetch all (needed) fields. If you have one client it is very likely that for most entities the client fetches all fields anyways, otherwise they would not be part of you API, right? Then use something like query introsprection to measure slow queries and then find out which field exactly is slow. Then you optimise only the slow thing (see for example my answer here that optimises a single use case). And if you are a big ecomerce platform please don't use Dataloader for this. Build something smarter and don't use JavaScript.

like image 65
Herku Avatar answered Oct 20 '22 13:10

Herku