Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

mongo $unwind and $group

I have two collections. One of which I wish to add a reference to the other and have it populated on return.

Here is an example json I am trying to achieve as the result:

{
  "title": "Some Title",
  "uid": "some-title",
  "created_at": "1412159926",
  "updated_at": "1412159926",
  "id": "1",
  "metadata": {
    "date": "2016-10-17",
    "description": "a description"
  },
  "tags": [
    {
      "name": "Tag 1",
      "uid": "tag-1"
    },
    {
      "name": "Tag 2",
      "uid": "tag-2"
    },
    {
      "name": "Tag 3",
      "uid": "tag-3"
    }
  ]
}

Here is the mongo query I have which gets my close, but it nests the original body of the item within the _id object.

db.tracks.aggregate([{
    $unwind: "$tags"
}, {
    $lookup: {
        from: "tags",
        localField: "tags",
        foreignField: "_id",
        as: "tags"
    }
}, {
    $unwind: "$tags"
}, {
    $group: {
        "_id": {
            "title": "$title",
            "uid": "$uid",
            "metadata": "$metadata"
        },
        "tags": {
            "$push": "$tags"
        }
    }
}])

So the result is this:

{
    "_id" : {
        "title" : "Some Title",
        "uid" : "some-title",
        "metadata" : {
            "date" : "2016-10-17",
            "description" : "a description"
        }
    },
    "tags" : [ 
        {
            "_id" : ObjectId("580499d06fe29ce7093fb53a"),
            "name" : "Tag 1",
            "uid" : "tag-1"
        }, 
        {
            "_id" : ObjectId("580499d06fe29ce7093fb53b"),
            "name" : "Tag 2",
            "uid" : "tag-2"
        }
    ]
}

Is there a way to achieve the desired output? Also is there a way to not have to define in the $group all the fields which I wish to return, I would like to return the original Object but with the referenced documents in the tags array.

like image 878
mchaffe Avatar asked Oct 17 '16 10:10

mchaffe


People also ask

What is the use of $unwind in MongoDB?

What is MongoDB $unwind? The MongoDB $unwind operator is used to deconstruct an array field in a document and create separate output documents for each item in the array.

What is $Group in MongoDB?

The $group stage separates documents into groups according to a "group key". The output is one document for each unique group key. A group key is often a field, or group of fields. The group key can also be the result of an expression.

How do I join two collections in MongoDB?

For performing MongoDB Join two collections, you must use the $lookup operator. It is defined as a stage that executes a left outer join with another collection and aids in filtering data from joined documents. For example, if a user requires all grades from all students, then the below query can be written: Students.

What does $project do in MongoDB?

The $project takes a document that can specify the inclusion of fields, the suppression of the _id field, the addition of new fields, and the resetting of the values of existing fields. Alternatively, you may specify the exclusion of fields. Specifies the inclusion of a field.


1 Answers

Since you had initially pivoted your original documents on the tags array field which means the documents will be denormalized, your $group pipeline should use the _id field as its _id key and access the other fields using the $first or $last operator.

The group pipeline operator is similar to the SQL's GROUP BY clause. In SQL, you can't use GROUP BY unless you use any of the aggregation functions. The same way, we have to use an aggregation function in MongoDB as well, so unfortunately there is no other way of not having to define in the $group pipeline all the fields which you wish to return apart from using the $first or $last operator on each field:

db.tracks.aggregate([
    { "$unwind": "$tags" }, 
    {
        "$lookup": {
            "from": "tags",
            "localField": "tags",
            "foreignField": "_id",
            "as": "resultingArray"
        }
    }, 
    { "$unwind": "$resultingArray" },
    {
        "$group": {
            "_id": "$_id",
            "title": { "$first": "$title" },
            "uid": { "$first": "$uid" },
            "created_at": { "$first": "$created_at" },
            "updated_at": { "$first": "$updated_at" },
            "id": { "$first": "$id" },
            "metadata": { "$first": "$metadata" },
            "tags": { "$push": "$resultingArray" }
        }
    }
])

One trick I always use whenever I want to debug a pipeline that's giving unexpected results is to run the aggregation with just the first pipeline operator. If that gives the expected result, add the next.

In the answer above, you'd first try aggregating just the $unwind; if that works, add the $lookup. This can help you narrow down which operator is causing issues. In this case, you could run the pipeline with just the first three steps since you believe the $group is the one causing issues and then inspect the resulting documents from that pipeline:

db.tracks.aggregate([
    { "$unwind": "$tags" }, 
    {
        "$lookup": {
            "from": "tags",
            "localField": "tags",
            "foreignField": "_id",
            "as": "resultingArray"
        }
    }, 
    { "$unwind": "$resultingArray" }
])

which yields the output

/* 1 */
{
    "_id" : ObjectId("5804a6c900ce8cbd028523d9"),
    "title" : "Some Title",
    "uid" : "some-title",
    "created_at" : "1412159926",
    "updated_at" : "1412159926",
    "id" : "1",
    "metadata" : {
        "date" : "2016-10-17",
        "description" : "a description"
    },
    "resultingArray" : {
        "name" : "Tag 1",
        "uid" : "tag-1"
    }
}

/* 2 */
{
    "_id" : ObjectId("5804a6c900ce8cbd028523d9"),
    "title" : "Some Title",
    "uid" : "some-title",
    "created_at" : "1412159926",
    "updated_at" : "1412159926",
    "id" : "1",
    "metadata" : {
        "date" : "2016-10-17",
        "description" : "a description"
    },
    "resultingArray" : {
        "name" : "Tag 2",
        "uid" : "tag-2"
    }
}

/* 3 */
{
    "_id" : ObjectId("5804a6c900ce8cbd028523d9"),
    "title" : "Some Title",
    "uid" : "some-title",
    "created_at" : "1412159926",
    "updated_at" : "1412159926",
    "id" : "1",
    "metadata" : {
        "date" : "2016-10-17",
        "description" : "a description"
    },
    "resultingArray" : {
        "name" : "Tag 3",
        "uid" : "tag-3"
    }
}

From inspection you will see that for each input document, the last pipeline outputs 3 documents where 3 is the number of array elements in the computed field resultingArray and they all have a common _id and the other fields with the exception of the resultingArray field which is different, thus you get your desired results by adding a pipeline that groups the documents by the _id field and subsequently getting the other fields with $first or $last operator, as in the given solution:

db.tracks.aggregate([
    { "$unwind": "$tags" }, 
    {
        "$lookup": {
            "from": "tags",
            "localField": "tags",
            "foreignField": "_id",
            "as": "resultingArray"
        }
    }, 
    { "$unwind": "$resultingArray" },
    {
        "$group": {
            "_id": "$_id",
            "title": { "$first": "$title" },
            "uid": { "$first": "$uid" },
            "created_at": { "$first": "$created_at" },
            "updated_at": { "$first": "$updated_at" },
            "id": { "$first": "$id" },
            "metadata": { "$first": "$metadata" },
            "tags": { "$push": "$resultingArray" }
        }
    }
])
like image 60
chridam Avatar answered Oct 26 '22 12:10

chridam