Looking at the DataLoader library, how is it caching and batching requests?
The instructions specify usage in the following way:
var DataLoader = require('dataloader')
var userLoader = new DataLoader(keys => myBatchGetUsers(keys));
userLoader.load(1)
.then(user => userLoader.load(user.invitedByID))
.then(invitedBy => console.log(`User 1 was invited by ${invitedBy}`));
// Elsewhere in your application
userLoader.load(2)
.then(user => userLoader.load(user.lastInvitedID))
.then(lastInvited => console.log(`User 2 last invited ${lastInvited}`));
But I am unclear how the load
function is working, and what the myBatchGetUsers
function might look like. Please can you provide me an example if possible!
The DataLoader utility from Facebook works by combining input requests with a batch function that you have to provide. It only works with requests that use Identifiers
.
There are three phases:
Loader
object is delayed until process.nextTick
Loader
just call the myBatchGetUsers
function you have provided with the combination of all requested Keys.That's why in your provided example you should have only two requests:
invitedByID
)To implement this with mongodb for example, you should just define the myBatchGetUsers function to use the find
method appropriately:
function myBatchGetUsers(keys) {
// usersCollection is a promisified mongodb collection
return usersCollection.find(
{
_id: { $in: keys }
}
)
}
I found it helpful to recreate the part of dataloader
that I use, to see one possible way that it could be implemented. (in my case I only use the .load()
function)
So, creating a new instance of the DataLoader
constructor gives you 2 things:
The constructor could look something like this:
function DataLoader (_batchLoadingFn) {
this._keys = []
this._batchLoadingFn = _batchLoadingFn
}
And instances of the DataLoader
constructor have access to a .load()
function, which needs to be able to access the _keys
property. So it's defined on the DataLoad.prototype
object:
DataLoader.prototype.load = function(key) {
// this._keys references the array defined in the constructor function
}
When creating a new object via the DataLoader constructor (new DataLoader(fn)
), the fn
you pass it needs to fetch data from somewhere, taking an array of keys as arguments, and return a promise that resolves to an array of values that corresponds to the initial array of keys.
For example, here is a dummy function that takes an array of keys, and passes the same array back but with the values doubled:
const batchLoadingFn = keys => new Promise( resolve => resolve(keys.map(k => k * 2)) )
keys: [1,2,3]
vals: [2,4,6]
keys[0] corresponds to vals[0]
keys[1] corresponds to vals[1]
keys[2] corresponds to vals[2]
Then every time you call the .load(indentifier)
function, you add a key to the _keys
array, and at some point the batchLoadingFn
is called, and is passed the _keys
array as an argument.
The trick is... How to call .load(id)
many times but with the batchLoadingFn
only executed once? This is cool, and the reason I explored how this library works.
I found that it's possible to do this by specifying that batchLoadingFn
is executed after a timeout, but that if .load()
is called again before the timeout interval, then the timeout is canceled, a new key is added and a call to batchLoadingFn
is rescheduled. Achieving this in code looks like so:
DataLoader.prototype.load = function(key) {
clearTimeout(this._timer)
this._timer = setTimeout(() => this.batchLoadingFn(), 0)
}
Essentially calling .load()
deletes pending calls to batchLoadingFn
, and then schedules a new call to batchLoadingFn
at the back of the event loop. This guarantees that over a short space of time if .load()
is called many times, batchLoadingFn
will only be called once. This is actually very similar to debouncing. Or, at least it's useful when building websites and you want to do something on a mousemove
event, but you get far more events than you want to deal with. I THINK this is called debouncing.
But calling .load(key)
also needs to push a key to the _keys
array, which we can in the body of the .load
function by pushing the key
argument to _keys
(just this._keys.push(key)
). However, the contract of the .load
function is that it returns a single value pertaining to what the key argument resolves to. At some point the batchLoadingFn
will be called and get a result (it has to return a result that corresponds to the _keys
). Furthermore it's required that batchLoadingFn
actually returns the promise of that value.
This next bit I thought was particularly clever (and well worth the effort of looking at the source code)!
The dataloader
library, instead of keeping a list of keys in _keys
, actually keeps a list of keys, associated with a reference to a resolve
function, that when called results in a value being resolved as the result of .load()
. .load()
returns a promise, a promise is resolved when it's resolve
function is invoked.
So the _keys
array ACTUALLY keeps a list of [key, resolve]
tuples. And when your batchLoadingFn
returns, the resolve
function is invoked with a value (that hopefully corresponds to the the item in the _keys
array via index number).
So the .load
function looks like this (in terms of pushing a [key, resolve]
tuple to the _keys
array):
DataLoader.prototype.load = function(key) {
const promisedValue = new Promise ( resolve => this._keys.push({key, resolve}) )
...
return promisedValue
}
And all that's left is to execute the batchLoadingFn
with _keys
keys as an argument, and invoke the correct resolve
function on it's return
this._batchLoadingFn(this._keys.map(k => k.key))
.then(values => {
this._keys.forEach(({resolve}, i) => {
resolve(values[i])
})
this._keys = [] // Reset for the next batch
})
And combined, all the code to implement the above is here:
function DataLoader (_batchLoadingFn) {
this._keys = []
this._batchLoadingFn = _batchLoadingFn
}
DataLoader.prototype.load = function(key) {
clearTimeout(this._timer)
const promisedValue = new Promise ( resolve => this._keys.push({key, resolve}) )
this._timer = setTimeout(() => {
console.log('You should only see me printed once!')
this._batchLoadingFn(this._keys.map(k => k.key))
.then(values => {
this._keys.forEach(({resolve}, i) => {
resolve(values[i])
})
this._keys = []
})
}, 0)
return promisedValue
}
// Define a batch loading function
const batchLoadingFunction = keys => new Promise( resolve => resolve(keys.map(k => k * 2)) )
// Create a new DataLoader
const loader = new DataLoader(batchLoadingFunction)
// call .load() twice in quick succession
loader.load(1).then(result => console.log('Result with key = 1', result))
loader.load(2).then(result => console.log('Result with key = 2', result))
If I remember correctly I don't think the dataloader
library uses setTimeout
, and instead uses process.nextTick
. But I couldn't get that to work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With