I am using trying to get documents from MongoDB on node. Let's say documents have the following structure:
{ "_id": ObjectId, "title" : String, "tags" : Array<String> }
I'd like to sort them by relevance - so when I'm looking for documents that have either "blue" or "yellow" tag I'd like to get ones with both tags first. So far I managed by google, trial and error:
var tags = [ "yellow", "blue" ];
db.collection('files').aggregate([
{ $project : { tags: 1 } },
{ $unwind : "$tags" },
{ $match : { "tags": { "$in": tags } } },
{ $group : { _id: "$_id", relevance: { $sum:1 } } },
{ $sort : { relevance : -1 } },
], function(err, success) {
console.log(success);
});
It works just fine, I get sorted collection of ids:
[{"_id":"5371355045002fc820a09566","relevance":2},{"_id":"53712fc6c8fcd124216de6cd","relevance":2},{"_id":"5371302ebd4725dc1b908316","relevance":1}]
Now I would make another query and ask for documents with those ids - but here's my question: can it be done in one query?
Yes you can as is always the case when you are actually grouping on _id
then that value is essentially equivalent to the whole document. So it is just a matter of storing the whole document under the _id
field.
You have a couple of approaches to this depending on your MongoDB version, and in versions prior to MongoDB 2.6 you must specify the whole document structure in an initial $project
stage ( which may optionally come after a $match
which is generally a good idea ) in your pipeline before you actually manipulate the document:
var tags = ["yellow","blue"];
db.collection.aggregate([
{ "$project" : {
"_id": {
"_id": "$_id",
"title": "$title",
"tags": "$tags"
},
"tags": 1
}},
{ "$unwind": "$tags" },
{ "$match": { "tags": { "$in": tags } } },
{ "$group": { "_id": "$_id", "relevance": { "$sum":1 } } },
{ "$sort": { "relevance" : -1 } },
{ "$project": {
_id: "$_id._id",
"title": "$_id.title",
"tags": "$_id.tags"
}}
])
And of course, at the end of the pipeline you extract the information from the _id
field in order to get back your original structure. That is optional, but you usually want that.
For MongoDB 2.6 and above there is a variable available to the pipeline stages that holds the structure of the document at that stage of the pipeline known as $$ROOT
, and you can access this as a kind of shortcut to the above form like so:
var tags = ["yellow","blue"];
db.collection.aggregate([
{ "$project" : {
"_id": "$$ROOT",
"tags": 1
}},
{ "$unwind": "$tags" },
{ "$match": { "tags": { "$in": tags } } },
{ "$group": { "_id": "$_id", "relevance": { "$sum":1 } } },
{ "$sort": { "relevance" : -1 } },
{ "$project": {
"_id": "$_id._id",
"title": "$_id.title",
"tags": "$_id.tags"
}}
])
Keeping in mind that in order to restore the document you still need to specify all the required fields.
I would note that as you are "filtering" documents with your match condition in this case and as was mentioned earlier, you should actually be filtering with a $match
statement at the "head" of your pipeline. This is the only place where the aggregation framework can select an index in order to optimize the query, and it also reduces the number of documents that do not meet your conditions ( presuming that not everything has the tags "yellow" or "blue" ) that go through the remaining pipeline stages:
db.collection.aggregate([
{ "$match": { "tags": { "$in": tags } } },
{ "$project" : {
"_id": {
"_id": "$_id",
"title": "$title",
"tags": "$tags"
},
"tags": 1
}},
{ "$unwind": "$tags" },
{ "$match": { "tags": { "$in": tags } } },
{ "$group": { "_id": "$_id", "relevance": { "$sum":1 } } },
{ "$sort": { "relevance" : -1 } },
{ "$project": {
_id: "$_id._id",
"title": "$_id.title",
"tags": "$_id.tags"
}}
])
At any rate that should be generally more effective than trying to do another query which of course would not maintain your sort order in the way that you have done.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With