Given the following MongoDB collection of documents :
{
title : 'shirt one'
tags : [
'shirt',
'cotton',
't-shirt',
'black'
]
},
{
title : 'shirt two'
tags : [
'shirt',
'white',
'button down collar'
]
},
{
title : 'shirt three'
tags : [
'shirt',
'cotton',
'red'
]
},
...
How do you retrieve a list of items matching a list of tags, ordered by the total number of matched tags? For example, given this list of tags as input:
['shirt', 'cotton', 'black']
I'd want to retrieve the items ranked in desc order by total number of matching tags:
item total matches
-------- --------------
Shirt One 3 (matched shirt + cotton + black)
Shirt Three 2 (matched shirt + cotton)
Shirt Two 1 (matched shirt)
In a relational schema, tags would be a separate table, and you could join against that table, count the matches, and order by the count.
But, in Mongo... ?
Seems this approach could work,
But I'm not clear on how to implement this as a Mongo query, or if this is even the most efficient approach.
As i answered in In MongoDB search in an array and sort by number of matches
It's possible using Aggregation Framework.
Assumptions
tags
attribute is a set (no repeated elements)Query
This approach forces you to unwind the results and reevaluate the match predicate with unwinded results, so its really inefficient.
db.test_col.aggregate(
{$match: {tags: {$in: ["shirt","cotton","black"]}}},
{$unwind: "$tags"},
{$match: {tags: {$in: ["shirt","cotton","black"]}}},
{$group: {
_id:{"_id":1},
matches:{$sum:1}
}},
{$sort:{matches:-1}}
);
Expected Results
{
"result" : [
{
"_id" : {
"_id" : ObjectId("5051f1786a64bd2c54918b26")
},
"matches" : 3
},
{
"_id" : {
"_id" : ObjectId("5051f1726a64bd2c54918b24")
},
"matches" : 2
},
{
"_id" : {
"_id" : ObjectId("5051f1756a64bd2c54918b25")
},
"matches" : 1
}
],
"ok" : 1
}
Right now, it isnt possible to do unless you use MapReduce. The only problem with MapReduce is that it is slow (compared to a normal query).
The aggregation framework is slated for 2.2 (so should be available in 2.1 dev release) and should make this sort of thing much easier to do without MapReduce.
Personally, I do not think using M/R is an efficient way to do it. I would rather query for all the documents and do those calculations on the application side. It is easier and cheaper to scale your app servers than it is to scale your database servers so let the app servers do the number crunching. Of those, this approach may not work for you given your data access patterns and requirements.
An even simpler approach may be to just include a count
property in each of your tag objects and whenever you $push
a new tag to the array, you also $inc
the count
property. This is a common pattern in the MongoDB world, at least until the aggregation framework.
I'll second @Bryan in saying that MapReduce is the only possible way at the moment (and it's far from perfect). But, in case you desperately need it, here you go :-)
var m = function() {
var searchTerms = ['shirt', 'cotton', 'black'];
var me = this;
this.tags.forEach(function(t) {
searchTerms.forEach(function(st) {
if(t == st) {
emit(me._id, {matches : 1});
}
})
})
};
var r = function(k, vals) {
var result = {matches : 0};
vals.forEach(function(v) {
result.matches += v.matches;
})
return result;
};
db.shirts.mapReduce(m, r, {out: 'found01'});
db.found01.find();
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With