Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using $graphLookup with a condition

I am trying to find the ancestors of a given child. At some point, that child's ancestors last name changed. I want to find the last parent of this child with a given last name. For example:

{
    "_id":1,
    "parent":null
    "first":"Bob",
    "last":"Sagget"
},
{
    "_id":2,
    "parent":1,
    "first":"Jane",
    "last":"Dor"
},
{
    "_id":3,
    "parent":2,
    "first":"Crane",
    "last":"Dor"
},
{
    "_id":4,
    "parent":3,
    "first":"Ho",
    "last":"Dor"
},
{
    "_id":5,
    "parent":4,
    "first":"Mor",
    "last":"Dor"
}

I want to query for _id 5 and get the ancestors with the last name Dor. There will be other people with the name Dor in this database that I don't want to see, so I can't just query for last name of "Dor".

Here is my current aggregation query - this is giving me every ancestor all the way to _id of 1. How can I stop at _id of 2?:

db.PeopleDb.aggregate(
[
    {
        $graphLookup: {
                "from": "PeopleDb", 
                "startWith": "$parent", 
                "connectFromField": "parent", 
                "connectToField": "_id", 
                "as": "linearAncestors"
            }
        },
        {
        $match: {
        "_id":5
        }
    },
]);
like image 726
user0000001 Avatar asked Jun 14 '18 14:06

user0000001


1 Answers

The basic query you are looking for instead actually "starts" with the restriction on matching just the singular document with _id: 5 and then performs the $graphLookup in order to find the ancestors.

As for the "condition", there are some steps involved so it is best to walk through the whole process and understand what is happening:

db.PeopleDb.aggregate([
  { "$match": { "_id": 5 } },
  { "$graphLookup": {
    "from": "PeopleDb",
    "startWith": "$parent",
    "connectFromField": "parent",
    "connectToField": "_id",
    "as": "people",
    "depthField": "depth"
  }}
])

This then returns all the recursive chain connecting the parent to the _id, noting also the "optional" "depthField" setting to include the actual "depth" of the match within the returned results:

{
    "_id" : 5,
    "parent" : 4,
    "first" : "Mor",
    "last" : "Dor",
    "people" : [
        {
            "_id" : 1,
            "parent" : null,
            "first" : "Bob",
            "last" : "Sagget",
            "depth" : NumberLong(3)
        },
        {
            "_id" : 2,
            "parent" : 1,
            "first" : "Jane",
            "last" : "Dor",
            "depth" : NumberLong(2)
        },
        {
            "_id" : 3,
            "parent" : 2,
            "first" : "Crane",
            "last" : "Dor",
            "depth" : NumberLong(1)
        },
        {
            "_id" : 4,
            "parent" : 3,
            "first" : "Ho",
            "last" : "Dor",
            "depth" : NumberLong(0)
        }
    ]
}

So noting that 4 is returned since it is the parent of the initial document, then 3 for it's parent and then 2 as it's parent and so on.

You can restrict the "depth" of matching using the "optional" "maxDepth" parameter to the pipeline stage, or in the specific case of "excluding the ROOT" element you can use the "restrictSearchWithMatch" option to simply exclude results with a null parent:

db.PeopleDb.aggregate([
  { "$match": { "_id": 5 } },
  { "$graphLookup": {
    "from": "PeopleDb",
    "startWith": "$parent",
    "connectFromField": "parent",
    "connectToField": "_id",
    "as": "people",
    "depthField": "depth",
    "restrictSearchWithMatch": { "parent": { "$ne": null } }
  }}
])

Which returns the same result, but excludes the "ROOT" document where the "parent" field is null:

{
    "_id" : 5,
    "parent" : 4,
    "first" : "Mor",
    "last" : "Dor",
    "people" : [
        {
            "_id" : 2,
            "parent" : 1,
            "first" : "Jane",
            "last" : "Dor",
            "depth" : NumberLong(2)
        },
        {
            "_id" : 3,
            "parent" : 2,
            "first" : "Crane",
            "last" : "Dor",
            "depth" : NumberLong(1)
        },
        {
            "_id" : 4,
            "parent" : 3,
            "first" : "Ho",
            "last" : "Dor",
            "depth" : NumberLong(0)
        }
    ]
}

And of course the same principle applies to your "last" condition which can match only documents where that condition was true. Here I'll show both conditions, but the $or is optional:

db.PeopleDb.aggregate([
  { "$match": { "_id": 5 } },
  { "$graphLookup": {
    "from": "PeopleDb",
    "startWith": "$parent",
    "connectFromField": "parent",
    "connectToField": "_id",
    "as": "people",
    "depthField": "depth",
    "restrictSearchWithMatch": {
      "$or": [
        { "parent": { "$ne": null }},
        { "last": "Dor" }
      ]
    }
  }}
])

Note however that the "restrictSearchWithMatch" is a "recursive" condition, therefore if any "ancestor" in the chain does not meet the "last" condition then the chain is broken and further ancestors are not retrieved. In order to get "all the ancestors" yet only show those who have the matching "last" then you $filter the resulting array content instead:

db.PeopleDb.aggregate([
  { "$match": { "_id": 5 } },
  { "$graphLookup": {
    "from": "PeopleDb",
    "startWith": "$parent",
    "connectFromField": "parent",
    "connectToField": "_id",
    "as": "people",
    "depthField": "depth",
  }},
  { "$addFields": {
    "people": {
      "$filter": {
        "input": "$people",
        "cond": { "$eq": [ "$$this.last", "Dor" ] }
      }
    }
  }}
])

Or in fact "dynamically" by comparing to the value of the initial matched document to which the "ancestors" are related to by using the field value expression instead of a hard coded value:

db.PeopleDb.aggregate([
  { "$match": { "_id": 5 } },
  { "$graphLookup": {
    "from": "PeopleDb",
    "startWith": "$parent",
    "connectFromField": "parent",
    "connectToField": "_id",
    "as": "people",
    "depthField": "depth",
  }},
  { "$addFields": {
    "people": {
      "$filter": {
        "input": "$people",
        "cond": { "$eq": [ "$$this.last", "$last" ] }
      }
    }
  }}
])

In this case it's the same 2,3,4 ancestors, but in the event that say ancestor 3 actually had a different value for "last" then using $filter actually returns 2,4 whilst the "restrictSearchWithMatch" would only return 4 since the 3 would "break the chain". That's the main difference.

N.B The one thing you cannot do of yet using $graphLookup is that "field comparison expression" is not allowed. If you wanted something along those lines then you would do further manipulation with $filter and possibly other operations such as $indexOfArray if your actual intent was indeed to "break the chain" at the point where that "field comparison" was not met within a recursive search.

like image 83
Neil Lunn Avatar answered Nov 13 '22 19:11

Neil Lunn