Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

mongoDB aggregation: $addToSet then $sort

I'm trying to sort unique values in arrays in multiple fields from a mongoDB collection (with the nodeJS driver).

a small dataset :

[{
    "_id" : "5c93db3dd0184516406013f7",
    "filters" : {
        "genres" : [ 
            {
                "_id" : "9CXBYc4qP8sqcNMZ5",
                "fr" : "Art Abstrait",
                "en" : "Abstract Art",
                "de" : "Abstrakte Kunst",
                "it" : "Arte astratta",
                "es" : "Arte Abstracto"
            }
        ],
        "subjects" : [ 
            {
                "_id" : "3QjL6YSfmuY6NFHGG",
                "fr" : "Abstrait",
                "en" : "Abstract",
                "de" : "Abstrakt",
                "it" : "Astratto",
                "es" : "Abstracto"
            }
        ],
        "type" : {
            "_id" : "CYK2WcepkJsy5xXMo",
            "fr" : "Gravure au carborundum",
            "en" : "Carborundum etching",
            "de" : "Carborundum Radierung",
            "it" : "Incisione carborandum",
            "es" : "Grabado al Carborundum"
        }
    }
},
{
    "_id" : "5c93db3ed0184516406013f8",
    "filters" : {
        "genres" : [ 
            {
                "_id" : "9CXBYc4qP8sqcNMZ5",
                "fr" : "Art Abstrait",
                "en" : "Abstract Art",
                "de" : "Abstrakte Kunst",
                "it" : "Arte astratta",
                "es" : "Arte Abstracto"
            }
        ],
        "subjects" : [ 
            {
                "_id" : "3QjL6YSfmuY6NFHGG",
                "fr" : "Abstrait",
                "en" : "Abstract",
                "de" : "Abstrakt",
                "it" : "Astratto",
                "es" : "Abstracto"
            }
        ],
        "type" : {
            "_id" : "CYK2WcepkJsy5xXMo",
            "fr" : "Gravure au carborundum",
            "en" : "Carborundum etching",
            "de" : "Carborundum Radierung",
            "it" : "Incisione carborandum",
            "es" : "Grabado al Carborundum"
        }
    }
},
{
    "_id" : "5c93e19ed018451640601da6",
    "filters" : {
        "genres" : [ 
            {
                "_id" : "9CXBYc4qP8sqcNMZ5",
                "fr" : "Art Abstrait",
                "en" : "Abstract Art",
                "de" : "Abstrakte Kunst",
                "it" : "Arte astratta",
                "es" : "Arte Abstracto"
            }
        ],
        "subjects" : [ 
            {
                "_id" : "3QjL6YSfmuY6NFHGG",
                "fr" : "Abstrait",
                "en" : "Abstract",
                "de" : "Abstrakt",
                "it" : "Astratto",
                "es" : "Abstracto"
            }
        ],
        "type" : {
            "_id" : "KfGWEHL2pAto8nfze",
            "fr" : "Gravure",
            "en" : "Etching",
            "de" : "Radierung",
            "it" : "Incisione",
            "es" : "Grabado"
        }
    }
}]

the result of my query (with lang = 'en'):

{
  "subjects": [
    {
      "_id": "3QjL6YSfmuY6NFHGG",
      "fr": "Abstrait",
      "en": "Abstract",
      "de": "Abstrakt",
      "it": "Astratto",
      "es": "Abstracto"
    },
    {
      "_id": "3QjL6YSfmuY6NFHGG",
      "fr": "Abstrait",
      "en": "Abstract",
      "de": "Abstrakt",
      "it": "Astratto",
      "es": "Abstracto"
    }
  ],
  "genres": [
    {
      "_id": "9CXBYc4qP8sqcNMZ5",
      "fr": "Art Abstrait",
      "en": "Abstract Art",
      "de": "Abstrakte Kunst",
      "it": "Arte astratta",
      "es": "Arte Abstracto"
    },
    {
      "_id": "9CXBYc4qP8sqcNMZ5",
      "fr": "Art Abstrait",
      "en": "Abstract Art",
      "de": "Abstrakte Kunst",
      "it": "Arte astratta",
      "es": "Arte Abstracto"
    }
  ],
  "types": [
    {
      "_id": "CYK2WcepkJsy5xXMo",
      "fr": "Gravure au carborundum",
      "en": "Carborundum etching",
      "de": "Carborundum Radierung",
      "it": "Incisione carborandum",
      "es": "Grabado al Carborundum"
    },
    {
      "_id": "KfGWEHL2pAto8nfze",
      "fr": "Gravure",
      "en": "Etching",
      "de": "Radierung",
      "it": "Incisione",
      "es": "Grabado"
    }
  ]
}

The pipeline for the aggregation :

[
    { $unwind: '$filters.subjects' },
    { $unwind: '$filters.genres' },
    { $group: {
      _id: null,
      subjects: { $addToSet: '$filters.subjects' },
      types: { $addToSet: '$filters.type' },
      genres: { $addToSet: '$filters.genres' },
    }},
    { $unwind: '$subjects' },
    { $unwind: '$genres' },
    { $unwind: '$types' },
    { $sort: {
      [`subjects.${lang}`]: 1,
      [`types.${lang}`]: 1,
      [`genres.${lang}`]: 1,
    }},
    { $group: {
      _id: null,
      subjects: { $push: '$subjects' },
      types: { $push: '$types' },
      genres: { $push: '$genres' },
    }},
    { $project: {
      _id: false,
      subjects: '$subjects',
      types: '$types',
      genres: '$genres'
    }}
]

Instead of getting sorted arrays of unique values as so : [A, B, C, D, ...]

I get sorted arrays with non unique values as so : [A, A, A, B, B, B, C, C, C, D, D, D, ...]

Making the $addToSet grouping useless.

Any idea on what I got wrong ?

like image 657
Nathan Schwarz Avatar asked Mar 15 '26 03:03

Nathan Schwarz


1 Answers

The problem you're encountering is that each $unwind is going to create a copy of the document with a single array element from the array you're unwinding. You have the following:

...
{ $unwind: '$subjects' },
{ $unwind: '$genres' },
{ $unwind: '$types' },
...

So, first you're unwinding subjects which produces a documents for each element in subjects, which we will call subject. Thus we have a document for each subject which themselves contain the arrays genres and types. On unwinding genres next, each subject document gets unwinded to contain an element genre from genres. This gives use genres.length copies of each subject--that is, each subject is duplicated based on how many genres there are in the array. A similar situation occurs when unwinding types.

In short, you're duplicating your data on every $unwind call.

To illustrate with a simpler example:

// Doc:
{
    ints: [1, 2],
    alpha: ['a', 'b', 'c']

}

// Pipeline:
[
    { $unwind: "$ints" },
    { $unwind: "$alpha" }

]

// After unwinding "ints":
[
    { ints: 1, alpha: ['a', 'b', 'c'] },
    { ints: 2, alpha: ['a', 'b', 'c'] }

]

// After unwinding "alpha":
[
    { ints: 1, alpha: 'a' },
    { ints: 1, alpha: 'b' },
    { ints: 1, alpha: 'c' },
    { ints: 2, alpha: 'a' },
    { ints: 2, alpha: 'b' },
    { ints: 2, alpha: 'c' }
]

// Result: 3 duplicates of each value in "ints", 2 duplicates of each value in "alpha".

To resolve this, a couple of options immediately come to mind:
1. You can $unwind an array, $sort it, and $group the results to $push the elements back into an array, repeating for each array individually, one at a time. Note that you would need to use the $first operator to grab only one copy of each duplicated array when grouping.
2. You can change your last $group pipeline stage to use $addToSet instead of $push operations.

There are likely other options available to you, but either of the above should suffice for something quick.

like image 154
B. Fleming Avatar answered Mar 18 '26 00:03

B. Fleming



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!