Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Aggregation filter after $lookup

How can I add a filter after an $lookup or is there any other method to do this?

My data collection test is:

{ "_id" : ObjectId("570557d4094a4514fc1291d6"), "id" : 100, "value" : "0", "contain" : [ ] } { "_id" : ObjectId("570557d4094a4514fc1291d7"), "id" : 110, "value" : "1", "contain" : [ 100 ] } { "_id" : ObjectId("570557d4094a4514fc1291d8"), "id" : 120, "value" : "1", "contain" : [ 100 ] } { "_id" : ObjectId("570557d4094a4514fc1291d9"), "id" : 121, "value" : "2", "contain" : [ 100, 120 ] } 

I select id 100 and aggregate the childs:

db.test.aggregate([ {   $match : {     id: 100   } }, {   $lookup : {     from : "test",     localField : "id",     foreignField : "contain",     as : "childs"   } }]); 

I get back:

{     "_id":ObjectId("570557d4094a4514fc1291d6"),   "id":100,   "value":"0",   "contain":[ ],   "childs":[ {         "_id":ObjectId("570557d4094a4514fc1291d7"),       "id":110,       "value":"1",       "contain":[ 100 ]     },     {         "_id":ObjectId("570557d4094a4514fc1291d8"),       "id":120,       "value":"1",       "contain":[ 100 ]     },     {         "_id":ObjectId("570557d4094a4514fc1291d9"),       "id":121,       "value":"2",       "contain":[ 100, 120 ]     }   ] } 

But I want only childs that match with "value: 1"

At the end I expect this result:

{     "_id":ObjectId("570557d4094a4514fc1291d6"),   "id":100,   "value":"0",   "contain":[ ],   "childs":[ {         "_id":ObjectId("570557d4094a4514fc1291d7"),       "id":110,       "value":"1",       "contain":[ 100 ]     },     {         "_id":ObjectId("570557d4094a4514fc1291d8"),       "id":120,       "value":"1",       "contain":[ 100 ]     }   ] } 
like image 220
Phillip Bartschinski Avatar asked Apr 06 '16 18:04

Phillip Bartschinski


People also ask

How do you filter an array of objects in MongoDB aggregation?

Filter MongoDB Array Element Using $Filter Operator This operator uses three variables: input – This represents the array that we want to extract. cond – This represents the set of conditions that must be met. as – This optional field contains a name for the variable that represent each element of the input array.

What is $lookup in MongoDB?

$lookup performs an equality match on the localField to the foreignField from the documents of the from collection. If an input document does not contain the localField , the $lookup treats the field as having a value of null for matching purposes.

What is $filter in MongoDB?

Definition. $filter. Selects a subset of an array to return based on the specified condition. Returns an array with only those elements that match the condition. The returned elements are in the original order.

How do I filter data in MongoDB collection?

Parameter description syntax of filter operator in MongoDB. In the input parameter, we have passed the array field to filter the documents. We need to use the $ sign before using the input field parameter in the filter operator. 3) As – It is an optional parameter used in the filter operator.


1 Answers

The question here is actually about something different and does not need $lookup at all. But for anyone arriving here purely from the title of "filtering after $lookup" then these are the techniques for you:

MongoDB 3.6 - Sub-pipeline

db.test.aggregate([     { "$match": { "id": 100 } },     { "$lookup": {       "from": "test",       "let": { "id": "$id" },       "pipeline": [         { "$match": {           "value": "1",           "$expr": { "$in": [ "$$id", "$contain" ] }         }}       ],       "as": "childs"     }} ]) 

Earlier - $lookup + $unwind + $match coalescence

db.test.aggregate([     { "$match": { "id": 100 } },     { "$lookup": {         "from": "test",         "localField": "id",         "foreignField": "contain",         "as": "childs"     }},     { "$unwind": "$childs" },     { "$match": { "childs.value": "1" } },     { "$group": {         "_id": "$_id",         "id": { "$first": "$id" },         "value": { "$first": "$value" },         "contain": { "$first": "$contain" },         "childs": { "$push": "$childs" }      }} ]) 

If you question why would you $unwind as opposed to using $filter on the array, then read Aggregate $lookup Total size of documents in matching pipeline exceeds maximum document size for all the detail on why this is generally necessary and far more optimal.

For releases of MongoDB 3.6 and onwards, then the more expressive "sub-pipeline" is generally what you want to "filter" the results of the foreign collection before anything gets returned into the array at all.

Back to the answer though which actually describes why the question asked needs "no join" at all....


Original

Using $lookup like this is not the most "efficient" way to do what you want here. But more on this later.

As a basic concept, just use $filter on the resulting array:

db.test.aggregate([      { "$match": { "id": 100 } },      { "$lookup": {         "from": "test",         "localField": "id",         "foreignField": "contain",         "as": "childs"     }},     { "$project": {         "id": 1,         "value": 1,         "contain": 1,         "childs": {            "$filter": {                "input": "$childs",                "as": "child",                "cond": { "$eq": [ "$$child.value", "1" ] }            }         }     }} ]); 

Or use $redact instead:

db.test.aggregate([      { "$match": { "id": 100 } },      { "$lookup": {         "from": "test",         "localField": "id",         "foreignField": "contain",         "as": "childs"     }},     { "$redact": {         "$cond": {            "if": {               "$or": [                 { "$eq": [ "$value", "0" ] },                 { "$eq": [ "$value", "1" ] }               ]            },            "then": "$$DESCEND",            "else": "$$PRUNE"         }     }} ]); 

Both get the same result:

{     "_id":ObjectId("570557d4094a4514fc1291d6"),   "id":100,   "value":"0",   "contain":[ ],   "childs":[ {         "_id":ObjectId("570557d4094a4514fc1291d7"),       "id":110,       "value":"1",       "contain":[ 100 ]     },     {         "_id":ObjectId("570557d4094a4514fc1291d8"),       "id":120,       "value":"1",       "contain":[ 100 ]     }   ] } 

Bottom line is that $lookup itself cannot "yet" query to only select certain data. So all "filtering" needs to happen after the $lookup

But really for this type of "self join" you are better off not using $lookup at all and avoiding the overhead of an additional read and "hash-merge" entirely. Just fetch the related items and $group instead:

db.test.aggregate([   { "$match": {      "$or": [       { "id": 100 },       { "contain.0": 100, "value": "1" }     ]   }},   { "$group": {     "_id": {       "$cond": {         "if": { "$eq": [ "$value", "0" ] },         "then": "$id",         "else": { "$arrayElemAt": [ "$contain", 0 ] }       }     },     "value": { "$first": { "$literal": "0"} },     "childs": {       "$push": {         "$cond": {           "if": { "$ne": [ "$value", "0" ] },           "then": "$$ROOT",           "else": null         }       }     }   }},   { "$project": {     "value": 1,     "childs": {       "$filter": {         "input": "$childs",         "as": "child",         "cond": { "$ne": [ "$$child", null ] }       }     }   }} ]) 

Which only comes out a little different because I deliberately removed the extraneous fields. Add them in yourself if you really want to:

{   "_id" : 100,   "value" : "0",   "childs" : [     {       "_id" : ObjectId("570557d4094a4514fc1291d7"),       "id" : 110,       "value" : "1",       "contain" : [ 100 ]     },     {       "_id" : ObjectId("570557d4094a4514fc1291d8"),       "id" : 120,       "value" : "1",       "contain" : [ 100 ]     }   ] } 

So the only real issue here is "filtering" any null result from the array, created when the current document was the parent in processing items to $push.


What you also seem to be missing here is that the result you are looking for does not need aggregation or "sub-queries" at all. The structure that you have concluded or possibly found elsewhere is "designed" so that you can get a "node" and all of it's "children" in a single query request.

That means just the "query" is all that is really needed, and the data collection ( which is all that is happening since no content is really being "reduced" ) is just a function of iterating the cursor result:

var result = {};  db.test.find({   "$or": [     { "id": 100 },     { "contain.0": 100, "value": "1" }   ] }).sort({ "contain.0": 1 }).forEach(function(doc) {   if ( doc.id == 100 ) {     result = doc;     result.childs = []   } else {     result.childs.push(doc)   } })  printjson(result); 

This does exactly the same thing:

{   "_id" : ObjectId("570557d4094a4514fc1291d6"),   "id" : 100,   "value" : "0",   "contain" : [ ],   "childs" : [     {       "_id" : ObjectId("570557d4094a4514fc1291d7"),       "id" : 110,       "value" : "1",       "contain" : [               100       ]     },     {       "_id" : ObjectId("570557d4094a4514fc1291d8"),       "id" : 120,       "value" : "1",       "contain" : [               100       ]     }   ] } 

And serves as proof that all you really need to do here is issue the "single" query to select both the parent and children. The returned data is just the same, and all you are doing on either server or client is "massaging" into another collected format.

This is one of those cases where you can get "caught up" in thinking of how you did things in a "relational" database, and not realize that since the way the data is stored has "changed", you no longer need to use the same approach.

That is exactly what the point of the documentation example "Model Tree Structures with Child References" in it's structure, where it makes it easy to select parents and children within one query.

like image 178
Neil Lunn Avatar answered Oct 13 '22 10:10

Neil Lunn