Is it possible to perform both a map reduce with a lookup in the same query pipeline efficiently?
Let's say I've two collections:
{ _id, group_id, createdAt }{ _id, item_id }I want to get the top n item groups, based on the number of purchases on the most recent x items per group.
If I had the number of purchases available in the item documents, then I could aggregate and sort, but this is not the case.
I can get the most recent x items per group as so:
let x = 3;
let map = function () {
emit(this.group_id, { items: [this] });
};
let reduce = function (key, values) {
return { items: getLastXItems(x, values.map(v => v.items[0])) };
};
let scope = { x };
db.items.mapReduce(map, reduce, { out: { inline: 1 }, scope }, function(err, res) {
if (err) {
...
} else {
// res is an array of { group_id, items } where items is the last x items of the group
}
});
But I'm missing purchase count so I can't use it to sort groups, and output the top n groups (which btw I'm not even sure I can do)
I'm using this on a web server, and running the query with scope variable depending on the user context, so I don't want to output the result to another collection and have to do everything inline.
=== edit 1 === add data example:
Sample data could be:
// items
{ _id: '1, group_id: 'a', createdAt: 0 }
{ _id: '2, group_id: 'a', createdAt: 2 }
{ _id: '3, group_id: 'a', createdAt: 4 }
{ _id: '4, group_id: 'b', createdAt: 1 }
{ _id: '5, group_id: 'b', createdAt: 3 }
{ _id: '6, group_id: 'b', createdAt: 5 }
{ _id: '7, group_id: 'b', createdAt: 7 }
{ _id: '8, group_id: 'c', createdAt: 5 }
{ _id: '9, group_id: 'd', createdAt: 5 }
// purchases
{ _id: '1', item_id: '1' }
{ _id: '2', item_id: '1' }
{ _id: '3', item_id: '3' }
{ _id: '4', item_id: '5' }
{ _id: '5', item_id: '5' }
{ _id: '6', item_id: '6' }
{ _id: '7', item_id: '7' }
{ _id: '8', item_id: '7' }
{ _id: '9', item_id: '7' }
{ _id: '10', item_id: '3' }
{ _id: '11', item_id: '9' }
and sample result with n = 3 and x = 2 would be:
[
group_id: 'a', numberOfPurchasesOnLastXItems: 4,
group_id: 'b', numberOfPurchasesOnLastXItems: 3,
group_id: 'c', numberOfPurchasesOnLastXItems: 1,
]
I think this can be solved with the aggregation pipeline, but I've no idea on how bad this is, especially performance wise.
Concerns I have are:
Anyway, I think one solution I could be:
x = 2;
n = 3;
items.aggregate([
{
$lookup: {
from: 'purchases',
localField: '_id',
foreignField: 'item_id',
as: 'purchases',
},
},
/*
after the join, the data is like {
_id: <itemId>,
group_id: <itemGroupId>,
createdAt: <itemCreationDate>,
purchases: <arrayOfPurchases>,
}
*/
{
$project: {
group_id: 1,
createdAt: 1,
pruchasesCount: { $size: '$purchases' },
}
}
/*
after the projection, the data is like {
_id: <itemId>,
group_id: <itemGroupId>,
createdAt: <itemCreationDate>,
purchasesCount: <numberOfPurchases>,
}
*/
{
$sort: { createdAt: 1 }
},
{
$group: {
_id: '$group_id',
items: {
$push: '$purchasesCount',
}
}
}
/*
after the group, the data is like {
_id: <groupId>,
items: <array of number of purchases per item, sorted per item creation date>,
}
*/
{
$project: {
numberOfPurchasesOnMostRecentItems: { $sum: { $slice: ['$purchasesCount', x] } },
}
}
/*
after the projection, the data is like {
_id: <groupId>,
numberOfPurchasesOnMostRecentItems: <number of purchases on the last x items>,
}
*/
{
$sort: { numberOfPurchasesOnMostRecentItems: 1 }
},
{ $limit : n }
]);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With