Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to merge array field in document in Mongo aggregation

I have one requirement where i need to do aggregation on two records both have an array field with different value. What I need that when I do aggregation on these records the result should have one array with unique values from both different arrays. Here is example :

First record

 { Host:"abc.com" ArtId:"123", tags:[ "tag1", "tag2" ] }

Second record

{ Host:"abc.com" ArtId:"123", tags:[ "tag2", "tag3" ] }

After aggregation on host and artid i need result like this:

 { Host: "abc.com", ArtId: "123", count :"2", tags:[ "tag1", "tag2", "tag3" ]}

I tried $addToset in group statement but it gives me like this tags :[["tag1","tag2"],["tag2","tag3"]]

Could you please help me how i can achieve this in aggregation

like image 405
viren Avatar asked Nov 13 '14 08:11

viren


People also ask

How do I merge two MongoDB aggregate collections?

For performing MongoDB Join two collections, you must use the $lookup operator. It is defined as a stage that executes a left outer join with another collection and aids in filtering data from joined documents. For example, if a user requires all grades from all students, then the below query can be written: Students.

How do I merge objects in MongoDB?

$mergeObjects overwrites the field values as it merges the documents. If documents to merge include the same field name, the field, in the resulting document, has the value from the last document merged for the field.

How do I append to an array in MongoDB?

In MongoDB, the $push operator is used to appends a specified value to an array. If the mentioned field is absent in the document to update, the $push operator add it as a new field and includes mentioned value as its element. If the updating field is not an array type field the operation failed.


1 Answers

TLDR;

Modern releases should use $reduce with $setUnion after the initial $group as is shown:

db.collection.aggregate([
  { "$group": {
    "_id": { "Host": "$Host", "ArtId": "$ArtId" },
    "count": { "$sum": 1 },
    "tags": { "$addToSet": "$tags" }
  }},
  { "$addFields": {
    "tags": {
      "$reduce": {
        "input": "$tags",
        "initialValue": [],
        "in": { "$setUnion": [ "$$value", "$$this" ] }
      }
    }
  }}
])

You were right in finding the $addToSet operator, but when working with content in an array you generally need to process with $unwind first. This "de-normalizes" the array entries and essentially makes a "copy" of the parent document with each array entry as a singular value in the field. That's what you need to avoid the behavior you are seeing without using that.

Your "count" poses an interesting problem though, but easily solved through the use of a "double unwind" after an initial $group operation:

db.collection.aggregate([
    // Group on the compound key and get the occurrences first
    { "$group": {
        "_id": { "Host": "$Host", "ArtId": "$ArtId" },
        "tcount": { "$sum": 1 },
        "ttags": { "$push": "$tags" }
    }},

    // Unwind twice because "ttags" is now an array of arrays
    { "$unwind": "$ttags" },
    { "$unwind": "$ttags" },

    // Now use $addToSet to get the distinct values        
    { "$group": {
        "_id": "$_id",
        "tcount": { "$first": "$tcount" },
        "tags": { "$addToSet": "$ttags" }
    }},

    // Optionally $project to get the fields out of the _id key
    { "$project": {
        "_id": 0,
        "Host": "$_id.Host",
        "ArtId": "$_id.ArtId",
        "count": "$tcount",
        "tags": "$ttags"
    }}
])

That final bit with $project is also there because I used "temporary" names for each of the fields in other stages of the aggregation pipeline. This is because there is an optimization in $project that "copies" the fields from an existing stage in the order they already appeared "before" any "new" fields are added to the document.

Otherwise the output would look like:

{  "count":2 , "tags":[ "tag1", "tag2", "tag3" ], "Host": "abc.com", "ArtId": "123" }

Where the fields are not in the same order as you might think. Trivial really, but it matters to some people, so worth explaining why, and how to handle.

So $unwind does the work to keep the items separated and not in arrays, and doing the $group first allows you to get the "count" of the occurrences of the "grouping" key.

The $first operator used later "keeps" that "count" value, as it just got "duplicated" for every value present in the "tags" array. It's all the same value anyway so it does not matter. Just pick one.

like image 80
Neil Lunn Avatar answered Sep 18 '22 15:09

Neil Lunn