Keep the main document when filtered array is empty in aggregation pipeline

Question

This is really an addition to a previously asked question. With some help from @JohnnyHK I can now remove unwanted subdocuments from an array based on a particular criteria: deleted != null. I discovered that my $unwind pipe broke when there were no items in the array so I also implemented a $cond to add in a dummy object where the array was empty. Here is the code so far:

Collection.aggregate([
        { $match:
            { _id: ObjectID(collection_id) }
        },
        {
          $project:
            {
              _id: 1,
              name: 1,
              images: {
                $cond:
                  [
                    {
                      $eq:
                        [
                          "$images",
                          []
                        ]
                    },
                    [
                      dummyImg // Variable containing dummy object
                    ],
                    '$images'
                  ]
              },
            }
        },
        { $unwind: "$images" },
        { $match:
            { "images.deleted": null }
        },

        // Regroup the docs by _id to reassemble the images array
        {$group: {
            _id: '$_id',
            name: {$first: '$name'},
            images: {$push: '$images'}
        }}

    ], function (err, result) {
    if (err) {
        console.log(err);
        return;
    }
    console.log(result);
});

The problem now occurs when the array is not empty but only contains objects where deleted is null. I $unwind the images but the $match doesn't find any matches so can't the perform the final $group.

I'm thinking along the lines of pushing in the dummy object at the beginning of the pipeline and then counting the images towards the end. The dummy object will stay if it is the only one but will need removing if there are other image objects that have made it through the pipeline.

If this is a sensible route I'd be glad of some pointers. If my thinking is way off, any tips to steer me in the right direction will be gratefully received.

Thank you.

Neil Lunn · Accepted Answer

Modern MongoDB releases of course simply apply $filter and $addFields to write the filtered and possibly empty array result into the document:

Collection.aggregate([
  { "$addFields": {
    "images": {
      "$filter": {
        "input": "$filter",
        "as": "i",
        "cond": { "$eq": [ "$$i.deleted", null ] }
      }
    }
  }}
])

The answer you were given before wasn't a very modern or efficient answer and there are since other and better ways to deal with filtering the content from arrays than doing $unwind, $match and $group.

As on MongoDB 2.6 you can do this with unique array identifiers:

Collection.aggregate([
    { "$project": {
        "name": 1,
        "images": { "$cond": [
            { "$eq": [{ "$size": { "$ifNull": [ "$images",[]] }}, 0] },
            { "$ifNull": [ "$images", [] ] },
            { "$setDifference": [
                { "$map": {
                    "input": "$images",
                    "as": "i",
                    "in": { "$cond": [
                        { "$eq": [ "$$i.deleted", null ] },
                        "$$i",
                        false
                    ]}
                }},
                [false]
            ]}
        ]}
    }}
],

The $map operator transforms arrays in place in the document by returning each inspected element evaluated by the given expression. Here you can use $cond in order to test the field value and decide whether to return the field as is or otherwise return false.

The $setDifference operation "compares" the resulting transformed array with the other singular element array [false]. This has the effect of removing all items that did not match from the array and leaves behind just that and even an empty array where there were no matches.

The following with $redact is safe as long as your document does not contain the same referenced property at multiple levels of the document. The somewhat funny looking condition is because a "deleted": null property is actually being projected ( for evaluation purposes ) at the levels where it does not exists. This is needed because $redact is being used in a "recursive" way, descending the document tree to decide what it gets rid of, or "redacts":

Collection.aggregate([
    { "$redact": {
        "$cond": [ 
            { "$eq": [ { "$ifNull": [ "$deleted", null ] }, null ] },
            "$$DESCEND",
            "$$PRUNE"
        ]
    }}
]

That really is the most simple logic to implement for your specific purpose. Just remember that you likely couldn't use this later if you added another "deleted" field in your document somehow meaning something else.

If you really are stuck with a version of MongoDB earlier than 2.6 and don't have access to those operations, then of course you need to do the $unwind, $match and $groupprocess. So care needs to be taken both at the beginning with empty or missing arrays as well as when matching arrays with no matching entries.

One approach:

Collection.aggregate([
    // Cater for missing or empty arrays
    { "$project": {
        "name": 1,
        "images": { "$cond": [
            { "$eq": [{ "$ifNull": [ "$images", [] ] }, [] ] },
            { "$const": [{ "deleted": false }] },
            "$images"
        ]}
    }},

    // Safe to unwind
    { "$unwind": "$images" },

    // Just count the matched array entries first
    { "$group": {
        "_id": "$_id",
        "name": { "$first": "$name" },
        "images": { "$push": "$images" },
        "count": { "$sum": { "$cond": [
            { "$eq": [ "$images.deleted", null ] },
            1,
            0
        ]}}
    }},

    // Unwind again
    { "$unwind": "$images" },

    // Match either non deleted or unmatched array
    { "$match": { 
        "$or": [
            { "images.deleted": null},
            {  "count": 0 }
        ]
    }},

    // Group back with the things that were matched
    { "$group": {
        "_id": "$_id",
        "name": { "$first": "$name" },
        "images": { "$push": "$images" },
        "count": { "$first": "$count" }
    }},

    // Replace the un-matched arrays with empty ones
    { "$project": {
         "name": 1,
         "images": { "$cond": [
             { "$eq": [ "$count", 0 ] },
             [],
             "$images"
         ]}
    }}
],

So there is a bit more lifting there but the general principle is to get a "count" of the matched elements only and when you are filtering you also keep the array items with a 0 "count" but just replace those whole arrays later.

You can also consider here that if you maintained and "activeCount" field on your document in the first place then you would remove the need to calculate that and drop a few stages.

And of course the other argument here is that you could save yourself the trouble of this by actually maintaining both "active" and "deleted" items in separate arrays. Doing that with each update removes any need to filter via aggregate. I suppose though it all depends on your real purposes. But food fo thought.

Of course this is all tested based on your original data with some modifications to suit the test cases:

{
    "_id" : ObjectId("54ec9cac83a214491d2110f4"),
    "name" : "my_images",
    "images" : [
        {
            "ext" : "jpeg",
            "type" : "image/jpeg",
            "_id" : ObjectId("54f2311026b0cb289ed04188"),
            "deleted" : null,
            "date_added" : ISODate("2015-02-28T21:20:16.961Z")
        },
        {
            "ext" : "jpeg",
            "type" : "image/jpeg",
            "_id" : ObjectId("54f2314a26b0cb289ed04189"),
            "deleted" : ISODate("2015-02-24T15:38:14.826Z"),
            "date_added" : ISODate("2015-02-28T21:21:14.910Z")
        },
        {
            "ext" : "jpeg",
            "type" : "image/jpeg",
            "_id" : ObjectId("54f2315526b0cb289ed0418a"),
            "deleted" : null,
            "date_added" : ISODate("2015-02-28T21:21:25.042Z")
        },
        {
            "ext" : "jpeg",
            "type" : "image/jpeg",
            "_id" : ObjectId("54f2315d26b0cb289ed0418b"),
            "deleted" : null,
            "date_added" : ISODate("2015-02-28T21:21:33.081Z")
        }
    ]
},
{
    "_id" : ObjectId("54fa6ca87c105bc872cc1886"),
    "name" : "another",
    "images" : [ ]
},
{
    "_id" : ObjectId("54fa6cef7c105bc872cc1887"),
    "name" : "final",
    "images" : [
        {
            "ext" : "jpeg",
            "type" : "image/jpeg",
            "_id" : ObjectId("54f2314a26b0cb289ed04189"),
            "deleted" : ISODate("2015-02-24T15:38:14.826Z"),
            "date_added" : ISODate("2015-02-28T21:21:14.910Z")
        }
    ]
}

Which all versions produce with safe results:

{
    "_id" : ObjectId("54ec9cac83a214491d2110f4"),
    "name" : "my_images",
    "images" : [
        {
            "ext" : "jpeg",
            "type" : "image/jpeg",
            "_id" : ObjectId("54f2311026b0cb289ed04188"),
            "deleted" : null,
            "date_added" : ISODate("2015-02-28T21:20:16.961Z")
        },
        {
            "ext" : "jpeg",
            "type" : "image/jpeg",
            "_id" : ObjectId("54f2315526b0cb289ed0418a"),
            "deleted" : null,
            "date_added" : ISODate("2015-02-28T21:21:25.042Z")
        },
        {
            "ext" : "jpeg",
            "type" : "image/jpeg",
            "_id" : ObjectId("54f2315d26b0cb289ed0418b"),
            "deleted" : null,
            "date_added" : ISODate("2015-02-28T21:21:33.081Z")
        }
    ]
},
{
    "_id" : ObjectId("54fa6ca87c105bc872cc1886"),
    "name" : "another",
    "images" : [ ]
},
{
    "_id" : ObjectId("54fa6cef7c105bc872cc1887"),
    "name" : "final",
    "images" : [ ]
}

Keep the main document when filtered array is empty in aggregation pipeline

Tags:

javascript

mongodb

mongodb-query

mongoose

aggregation-framework

Simon

1 Answers

Neil Lunn

Recent Activity

Donate For Us

Keep the main document when filtered array is empty in aggregation pipeline

Tags:

javascript

mongodb

mongodb-query

mongoose

aggregation-framework

Simon

1 Answers

Neil Lunn

Related questions

Recent Activity

Donate For Us