My current aggregation is:
db.group_members.aggregate({
$match: { user_id: { $in: [1,2,3] } }
}, {
$group: { _id: "$group_id" }
}, {
$sort: { last_post_at: -1 }
}, {
$limit: 5
})
For a document structure of:
{
_id: '...',
user_id: '...',
group_id: '...',
last_post_at: Date,
}
I've also got an index on {user_id: 1, last_post_at: -1}
Since my index is already on last_post_at
is the sort useless? I'm not 100% sure how the ordering of this.
My end goal is to replicate this SQL:
SELECT DISTINCT ON (group_id)
FROM group_members
WHERE user_id in [1,2,3]
ORDER_BY last_post_at DESC
LIMIT 5
I'm wondering how to make it performant for a very large group_members and still return it in the right order.
UPDATE: I'm hoping to find a solution that will limit the number of documents loaded into memory. This will be a fairly large collection and accessed very frequently.
Put the $sort before the $group, otherwise MongoDB can't use the index to help with sorting.
However, in your query it looks like you want to query for a relatively small number of user_ids compared to the total size of your group_members collection. So I recommend an index on user_id only. In that case MongoDB will have to sort your results in memory by last_post_at, but this is worthwhile in exchange for using an index for the initial lookup by user_id.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With