The main collection is retailer, which contains an array for stores. Each store contains an array of offers (you can buy in this store). This offers array has an array of sizes. (See example below)
Now I try to find all offers, which are available in the size L
.
{ "_id" : ObjectId("56f277b1279871c20b8b4567"), "stores" : [ { "_id" : ObjectId("56f277b5279871c20b8b4783"), "offers" : [ { "_id" : ObjectId("56f277b1279871c20b8b4567"), "size": [ "XS", "S", "M" ] }, { "_id" : ObjectId("56f277b1279871c20b8b4567"), "size": [ "S", "L", "XL" ] } ] } }
I've try this query: db.getCollection('retailers').find({'stores.offers.size': 'L'})
I expect some Output like that:
{ "_id" : ObjectId("56f277b1279871c20b8b4567"), "stores" : [ { "_id" : ObjectId("56f277b5279871c20b8b4783"), "offers" : [ { "_id" : ObjectId("56f277b1279871c20b8b4567"), "size": [ "S", "L", "XL" ] } ] } }
But the Output of my Query contains also the non matching offer with size
XS,X and M.
How I can force MongoDB to return only the offers, which matched my query?
Greetings and thanks.
Filter MongoDB Array Element Using $Filter Operator This operator uses three variables: input – This represents the array that we want to extract. cond – This represents the set of conditions that must be met. as – This optional field contains a name for the variable that represent each element of the input array.
To search the array of object in MongoDB, you can use $elemMatch operator. This operator allows us to search for more than one component from an array object.
Definition. $elemMatch. The $elemMatch operator matches documents that contain an array field with at least one element that matches all the specified query criteria.
To perform an update on all embedded array elements of each document that matches your query, use the filtered positional operator $[<identifier>] . The filtered positional operator $[<identifier>] specifies the matching array elements in the update document.
So the query you have actually selects the "document" just like it should. But what you are looking for is to "filter the arrays" contained so that the elements returned only match the condition of the query.
The real answer is of course that unless you are really saving a lot of bandwidth by filtering out such detail then you should not even try, or at least beyond the first positional match.
MongoDB has a positional $
operator which will return an array element at the matched index from a query condition. However, this only returns the "first" matched index of the "outer" most array element.
db.getCollection('retailers').find( { 'stores.offers.size': 'L'}, { 'stores.$': 1 } )
In this case, it means the "stores"
array position only. So if there were multiple "stores" entries, then only "one" of the elements that contained your matched condition would be returned. But, that does nothing for the inner array of "offers"
, and as such every "offer" within the matchd "stores"
array would still be returned.
MongoDB has no way of "filtering" this in a standard query, so the following does not work:
db.getCollection('retailers').find( { 'stores.offers.size': 'L'}, { 'stores.$.offers.$': 1 } )
The only tools MongoDB actually has to do this level of manipulation is with the aggregation framework. But the analysis should show you why you "probably" should not do this, and instead just filter the array in code.
In order of how you can achieve this per version.
First with MongoDB 3.2.x with using the $filter
operation:
db.getCollection('retailers').aggregate([ { "$match": { "stores.offers.size": "L" } }, { "$project": { "stores": { "$filter": { "input": { "$map": { "input": "$stores", "as": "store", "in": { "_id": "$$store._id", "offers": { "$filter": { "input": "$$store.offers", "as": "offer", "cond": { "$setIsSubset": [ ["L"], "$$offer.size" ] } } } } } }, "as": "store", "cond": { "$ne": [ "$$store.offers", [] ]} } } }} ])
Then with MongoDB 2.6.x and above with $map
and $setDifference
:
db.getCollection('retailers').aggregate([ { "$match": { "stores.offers.size": "L" } }, { "$project": { "stores": { "$setDifference": [ { "$map": { "input": { "$map": { "input": "$stores", "as": "store", "in": { "_id": "$$store._id", "offers": { "$setDifference": [ { "$map": { "input": "$$store.offers", "as": "offer", "in": { "$cond": { "if": { "$setIsSubset": [ ["L"], "$$offer.size" ] }, "then": "$$offer", "else": false } } }}, [false] ] } } } }, "as": "store", "in": { "$cond": { "if": { "$ne": [ "$$store.offers", [] ] }, "then": "$$store", "else": false } } }}, [false] ] } }} ])
And finally in any version above MongoDB 2.2.x where the aggregation framework was introduced.
db.getCollection('retailers').aggregate([ { "$match": { "stores.offers.size": "L" } }, { "$unwind": "$stores" }, { "$unwind": "$stores.offers" }, { "$match": { "stores.offers.size": "L" } }, { "$group": { "_id": { "_id": "$_id", "storeId": "$stores._id", }, "offers": { "$push": "$stores.offers" } }}, { "$group": { "_id": "$_id._id", "stores": { "$push": { "_id": "$_id.storeId", "offers": "$offers" } } }} ])
Lets break down the explanations.
So generally speaking, $filter
is the way to go here since it is designed with the purpose in mind. Since there are multiple levels of the array, you need to apply this at each level. So first you are diving into each "offers"
within "stores"
to examime and $filter
that content.
The simple comparison here is "Does the "size"
array contain the element I am looking for". In this logical context, the short thing to do is use the $setIsSubset
operation to compare an array ("set") of ["L"]
to the target array. Where that condition is true
( it contains "L" ) then the array element for "offers"
is retained and returned in the result.
In the higher level $filter
, you are then looking to see if the result from that previous $filter
returned an empty array []
for "offers"
. If it is not empty, then the element is returned or otherwise it is removed.
This is very similar to the modern process except that since there is no $filter
in this version you can use $map
to inspect each element and then use $setDifference
to filter out any elements that were returned as false
.
So $map
is going to return the whole array, but the $cond
operation just decides whether to return the element or instead a false
value. In the comparison of $setDifference
to a single element "set" of [false]
all false
elements in the returned array would be removed.
In all other ways, the logic is the same as above.
So below MongoDB 2.6 the only tool for working with arrays is $unwind
, and for this purpose alone you should not use the aggregation framework "just" for this purpose.
The process indeed appears simple, by simply "taking apart" each array, filtering out the things you don't need then putting it back together. The main care is in the "two" $group
stages, with the "first" to re-build the inner array, and the next to re-build the outer array. There are distinct _id
values at all levels, so these just need to be included at every level of grouping.
But the problem is that $unwind
is very costly. Though it does have purpose still, it's main usage intent is not to do this sort of filtering per document. In fact in modern releases it's only usage should be when an element of the array(s) needs to become part of the "grouping key" itself.
So it's not a simple process to get matches at multiple levels of an array like this, and in fact it can be extremely costly if implemented incorrectly.
Only the two modern listings should ever be used for this purpose, as they employ a "single" pipeline stage in addition to the "query" $match
in order to do the "filtering". The resulting effect is little more overhead than the standard forms of .find()
.
In general though, those listings still have an amount of complexity to them, and indeed unless you are really drastically reducing the content returned by such filtering in a way that makes a significant improvement in bandwidth used between the server and client, then you are better of filtering the result of the initial query and basic projection.
db.getCollection('retailers').find( { 'stores.offers.size': 'L'}, { 'stores.$': 1 } ).forEach(function(doc) { // Technically this is only "one" store. So omit the projection // if you wanted more than "one" match doc.stores = doc.stores.filter(function(store) { store.offers = store.offers.filter(function(offer) { return offer.size.indexOf("L") != -1; }); return store.offers.length != 0; }); printjson(doc); })
So working with the returned object "post" query processing is far less obtuse than using the aggregation pipeline to do this. And as stated the only "real" diffrerence would be that you are discarding the other elements on the "server" as opposed to removing them "per document" when received, which may save a little bandwidth.
But unless you are doing this in a modern release with only $match
and $project
, then the "cost" of processing on the server will greatly outweigh the "gain" of reducing that network overhead by stripping the unmatched elements first.
In all cases, you get the same result:
{ "_id" : ObjectId("56f277b1279871c20b8b4567"), "stores" : [ { "_id" : ObjectId("56f277b5279871c20b8b4783"), "offers" : [ { "_id" : ObjectId("56f277b1279871c20b8b4567"), "size" : [ "S", "L", "XL" ] } ] } ] }
as your array is embeded we cannot use $elemMatch, instead you can use aggregation framework to get your results:
db.retailers.aggregate([ {$match:{"stores.offers.size": 'L'}}, //just precondition can be skipped {$unwind:"$stores"}, {$unwind:"$stores.offers"}, {$match:{"stores.offers.size": 'L'}}, {$group:{ _id:{id:"$_id", "storesId":"$stores._id"}, "offers":{$push:"$stores.offers"} }}, {$group:{ _id:"$_id.id", stores:{$push:{_id:"$_id.storesId","offers":"$offers"}} }} ]).pretty()
what this query does is unwinds arrays (twice), then matches size and then reshapes the document to previous form. You can remove $group steps and see how it prints. Have a fun!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With