Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB's $graphLookup trying to get a tree structure

Tags:

mongodb

I'm trying to work with the new MongoDB v3.4 $graphLookup aggregation pipeline. I have this simple tree collection, with some nodes and a parent DBRef:

{ "_id" : ObjectId("59380657bbdbfb36c18a80f2"), "name" : "Root node 1" },
{ "_id" : ObjectId("5938068abbdbfb36c18a80f5"), "name" : "Child 1.1", "parent" : ObjectId("59380657bbdbfb36c18a80f2") },
{ "_id" : ObjectId("593806b0bbdbfb36c18a80f7"), "name" : "Subchild 1.1.1", "parent" : ObjectId("5938068abbdbfb36c18a80f5") },
{ "_id" : ObjectId("5938068abbdbfb36c18a80f6"), "name" : "Child 1.2", "parent" : ObjectId("59380657bbdbfb36c18a80f2") },
{ "_id" : ObjectId("59380657bbdbfb36c18a80f3"), "name" : "Root node 2" }

I would like to get this kind of tree structure:

- Root node 1
    - Child 1.1
        - Subchild 1.1.1
    - Child 1.2
- Root node 2

So, I'm trying to work with the new $graphLookup aggregation pipeline, like this:

db.getCollection('tree').aggregate([
    { $match: { parent: { $exists: false } } },
    {
        $graphLookup: {
            from: "tree",
            startWith: "$_id",
            connectFromField: "_id",
            connectToField: "parent",
            as: "children"
        }
   },
   { $sort: { name: 1 } }
])

But my issue is that I get all child of "Root node 1" in one collection:

{
    "_id" : ObjectId("59380657bbdbfb36c18a80f2"),
    "name" : "Root node 1",
    "children" : [
        { "_id" : ObjectId("593806b0bbdbfb36c18a80f7"), "name" : "Subchild 1.1.1", "parent" : ObjectId("5938068abbdbfb36c18a80f5") },
        { "_id" : ObjectId("5938068abbdbfb36c18a80f6"), "name" : "Child 1.2", "parent" : ObjectId("59380657bbdbfb36c18a80f2") },
        { "_id" : ObjectId("5938068abbdbfb36c18a80f5"), "name" : "Child 1.1", "parent" : ObjectId("59380657bbdbfb36c18a80f2") }
    ]
},
{
    "_id" : ObjectId("59380657bbdbfb36c18a80f3"),
    "name" : "Root node 2",
    "children" : [ ]
}

I have no idea how to lookup children recursively to get "Subchild 1.1.1" in the children collection of "Child 1.1". I'm looking for any suggestions. Thanks :)

like image 420
Jeremy Barthe Avatar asked Jun 07 '17 14:06

Jeremy Barthe


2 Answers

$graphLookup is not producing the hierarchy of dependencies - it performs a recursive search of connected documents, but results are flattened into the single-dimension array. Here is the quote from the documentation:

For each matching document, $graphLookup takes the value of the _id and checks every document in the tree collection for a matching parent value. For each match, $graphLookup adds the matching document in the from collection to an array children. This step continues recursively until no more matching documents are found, or until the operation reaches a recursion depth specified by the maxDepth parameter.

I.e. it searches for dependent documents recursively, but each found document is added to same children array of the parent document no matter how 'deep' the child is located.


Note - you don't see Child 1.1 with its connected Subchild 1.1.1 because you are filtering out these documents in match stage:

{ $match: { parent: { $exists: false } } }

that selects only documents which don't have a parent - "Root node 1" and "Root node 2". If you will remove this filter, then all other documents with the hierarchy of their dependants will be returned:

{
    "name" : "Child 1.1",
    "children" : [ 
        { "name" : "Subchild 1.1.1" }
    ]
},
{
    "name" : "Child 1.2"
    "children" : []
},
{
    "name" : "Root node 1",
    "children" : [ 
        { "name" : "Subchild 1.1.1" }, 
        { "name" : "Child 1.2" }, 
        { "name" : "Child 1.1" }
    ]
},
{
    "name" : "Root node 2",
    "children" : []
},
{
    "name" : "Subchild 1.1.1"
    "children" : []
}

If you don't want to mix children from different 'depth' of tree in single children array, then take a look at interesting comment in documentation

Setting the maxDepth field to 0 is equivalent to a non-recursive $lookup search stage.

It means that each document will get all its direct children into the children array, and after that lookup will stop without any further recursive search. Output will be

{
    "name" : "Child 1.1",
    "children" : [ 
        { "name" : "Subchild 1.1.1" }
    ]
},
{
    "name" : "Child 1.2"
    "children" : []
},
{
    "name" : "Root node 1",
    "children" : [ 
        { "name" : "Child 1.2" }, 
        { "name" : "Child 1.1" }
    ]
},
{
    "name" : "Root node 2",
    "children" : []
},
{
    "name" : "Subchild 1.1.1"
    "children" : []
}
like image 170
Sergey Berezovskiy Avatar answered Nov 14 '22 23:11

Sergey Berezovskiy


Had not found the way to solve it with a maxDepth parameter, but the depthField one (set simply to 'depth', in my case) came to help, and Jeremy Barthe suggestion to recompute results.

Here is a function that recursively rebuilds the $graphLookup result.

def list_tree_lookup(keys, data, depth=0):
    result = []
    for a in data:
        if a['depth'] + depth == 0 or (a['depth'] == depth and a['_id'] in keys):
            b = copy.deepcopy(a)
            b['children'] = list_tree_lookup(a['children'], data, depth+1)
            del b['depth']
            result.append(b)
    return result

I used import copy and copy.deepcopy just to be sure everything stay in place.

So, when you have

rootNode1 = {
    "_id" : "Root node 1",
    "children" : [
        { "_id" : "Subchild 1.1.1", "depth" : 1 },
        { "_id" : "Child 1.2", "depth" : 0 },
        { "_id" : "Child 1.1", "depth" : 0 }
    ]
}

calling list_tree_lookup([], rootNode1['children']) will produce

[
    {
        "_id": "Child 1.1"
        "children": [
            { "_id": "Subchild 1.1.1" }
        ]
    },
    {
        "_id": "Child 1.2"
        "children": []
    }
]

and it can be assigned back to rootNode1['children'].

like image 20
imy Avatar answered Nov 14 '22 22:11

imy