Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB $redact to filter out some elements of an array

I am trying to formulate a query over the sample bios collection http://docs.mongodb.org/manual/reference/bios-example-collection/:

Retrieve all persons and their awards that they received before receiving a Turing award.

I have come up with this query:

db.bios.aggregate([
    {$match: {"awards.award" : "Turing Award"}},
    {$project: {"award1": "$awards", "award2": "$awards", "first_name": "$name.first", "last_name": "$name.last"}},
    {$unwind: "$award1"},
    {$match: {"award1.award" : "Turing Award"}},
    {$unwind: "$award2"},
    {$redact: {
        $cond: {
           if: { $eq: [ { $gt: [ "$award1.year", "$award2.year"] }, true]},
           then: "$$KEEP",
           else: "$$PRUNE"
           }
        }
    }
])

And this is the answer:

/* 0 */
{
    "result" : [ 
    {
        "_id" : 1,
        "award1" : {
            "award" : "Turing Award",
            "year" : 1977,
            "by" : "ACM"
        },
        "award2" : {
            "award" : "W.W. McDowell Award",
            "year" : 1967,
            "by" : "IEEE Computer Society"
        },
        "first_name" : "John",
        "last_name" : "Backus"
    }, 
    {
        "_id" : 1,
        "award1" : {
            "award" : "Turing Award",
            "year" : 1977,
            "by" : "ACM"
        },
        "award2" : {
            "award" : "National Medal of Science",
            "year" : 1975,
            "by" : "National Science Foundation"
        },
        "first_name" : "John",
        "last_name" : "Backus"
    }, 
    {
        "_id" : 4,
        "award1" : {
            "award" : "Turing Award",
            "year" : 2001,
            "by" : "ACM"
        },
        "award2" : {
            "award" : "Rosing Prize",
            "year" : 1999,
            "by" : "Norwegian Data Association"
        },
        "first_name" : "Kristen",
        "last_name" : "Nygaard"
    }, 
    {
        "_id" : 5,
        "award1" : {
            "award" : "Turing Award",
            "year" : 2001,
            "by" : "ACM"
        },
        "award2" : {
            "award" : "Rosing Prize",
            "year" : 1999,
            "by" : "Norwegian Data Association"
        },
        "first_name" : "Ole-Johan",
        "last_name" : "Dahl"
    }
],
"ok" : 1
}

What I don't like about this solution is that I unwind $award2. Instead, I would be happy to keep award2 as an array, and only remove those awards that were received after award1. So, for instance, the answer for John Backus should be:

{
    "_id" : 1,
    "first_name" : "John",
    "last_name" : "Backus",
    "award1" : {
        "award" : "Turing Award",
        "year" : 1977,
        "by" : "ACM"
    },
    "award2" : [ 
        {
            "award" : "W.W. McDowell Award",
            "year" : 1967,
            "by" : "IEEE Computer Society"
        }, 
        {
            "award" : "National Medal of Science",
            "year" : 1975,
            "by" : "National Science Foundation"
        }
    ]
}

Is it possible to achieve it with $redact without doing $unwind: "$award2"?

like image 705
Elena Avatar asked Jun 25 '15 09:06

Elena


People also ask

How do I filter an array in MongoDB?

Filter MongoDB Array Element Using $Filter Operator This operator uses three variables: input – This represents the array that we want to extract. cond – This represents the set of conditions that must be met. as – This optional field contains a name for the variable that represent each element of the input array.

How do you unwind an array of objects in MongoDB?

Description. The MongoDB $unwind stages operator is used to deconstructing an array field from the input documents to output a document for each element. Every output document is the input document with the value of the array field replaced by the element.

What is redact in MongoDB?

Definition. $redact. Restricts the contents of the documents based on information stored in the documents themselves.


1 Answers

It might have been a little more helpful if you had included the original state of the document as an example in your question as this clearly shows "where you are coming from" and then to "where you want to get to" as a goal in addition to your desired output as given.

That's just a tip, but it seems that you are starting with a document like this:

{
    "_id" : 1,
    "name": { 
        "first" : "John",
        "last" : "Backus"
    },
    "awards" : [
        {
            "award" : "W.W. McDowell Award",
            "year" : 1967,
            "by" : "IEEE Computer Society"
        }, 
        {
            "award" : "National Medal of Science",
            "year" : 1975,
            "by" : "National Science Foundation"
        },
        { 
            "award" : "Turing Award",
            "year" : 1977,
            "by" : "ACM"
        },
        {
            "award" : "Some other award",
            "year" : 1979,
            "by" : "Someone Else"
        }
    ]
}

So the real points here is that while you may haved reached for $redact here (and it is a bit nicer than using $project for a logical condition and then using $match to filter that logical match ) this probably isn't the best tool for the comparison you want to do here.

Before moving on I just want to point out the main problem here with $redact. Whatever you do here the logic ( without an unwind ) would be essentially to compare "directly" on $$DESCEND in order to process the the array elements on the value of "year" at whatever level.

That recursion is going to invalidate the "award1" condition as well since it has the same field name. Even renaming that field kills the logic since a projected value where it was missing would not be greater than the tested value.

In a nutshell, $redact is ruled right out since you cannot say "take from here only" with the logic it applies.

The alternate is to use $map and $setDifference to filter contents from the arrays as follows:

db.bios.aggregate([
    { "$match": { "awards.award": "Turing Award" } },
    { "$project": {
        "first_name": "$name.first",
        "last_name": "$name.last",
        "award1": { "$setDifference": [
            { "$map": {
                "input": "$awards",
                "as": "a",
                "in": { "$cond": [
                    { "$eq": [ "$$a.award", "Turing Award" ] },
                    "$$a",
                    false
                ]}
            }},
            [false]
        ]},
        "award2": { "$setDifference": [
            { "$map": {
                "input": "$awards",
                "as": "a",
                "in": { "$cond": [
                    { "$ne": [ "$$a.award", "Turing Award" ] },
                    "$$a",
                    false
                ]}
            }},
            [false]
        ]}
    }},
    { "$unwind": "$award1" },
    { "$project": {
        "first_name": 1,
        "last_name": 1,
        "award1": 1,
        "award2": { "$setDifference": [
            { "$map": {
                "input": "$award2",
                "as": "a",
                "in": { "$cond": [
                     { "$gt": [ "$award1.year", "$$a.year" ] },
                     "$$a",
                     false
                 ]}
            }},
            [false]            
        ]}
    }}
])

And there really is no "pretty" way of getting around either the usage of $unwind in the itermediatary stage or even the second $project here, since $map ( and the $setDifference filter ) returns what is "still an array". So the $unwind is necessary to make the "array" a singular ( provided your condition only matches 1 element ) entry for which to use in comparison.

Trying to "squish" all the logic in a single $project will only result in "arrays of arrays" in the second output, and still some "unwinding" therefore required, but at least this way unwinding the (hopefully) 1 match is not really that costly and keeps the output clean.


But the other thing to really note here is that you are not really "aggregating" anything here at all. This is just document manipulation, so you might well consider to just do that manipulation directly in client code. As demonstrated with this shell example:

db.bios.find(
    { "awards.award": "Turing Award" },
    { "name": 1, "awards": 1 }
).forEach(function(doc) {
    doc.first_name = doc.name.first;
    doc.last_name = doc.name.last;
    doc.award1 = doc.awards.filter(function(award) {
        return award.award == "Turing Award"
    })[0];
    doc.award2 = doc.awards.filter(function(award) {
        return doc.award1.year > award.year;
    });
    delete doc.name;
    delete doc.awards;
    printjson(doc);
})

At any rate, both approaches will output the same:

{
    "_id" : 1,
    "first_name" : "John",
    "last_name" : "Backus",
    "award1" : {
            "award" : "Turing Award",
            "year" : 1977,
            "by" : "ACM"
    },
    "award2" : [
            {
                    "award" : "W.W. McDowell Award",
                    "year" : 1967,
                    "by" : "IEEE Computer Society"
            },
            {
                    "award" : "National Medal of Science",
                    "year" : 1975,
                    "by" : "National Science Foundation"
            }
    ]
}

The only real difference here is that by using .aggregate() the content of "award2" will already be filtered when returned from the server, which probably isn't going to be that much different from doing the client processing approach unless the items that would be removed comprises a reasonably large list per document.


For the record, the only alteration to your existing aggregation pipeline really required here would be to add a $group to the end to "re-combine" the array entries into a single document:

db.bios.aggregate([
    { "$match": { "awards.award": "Turing Award" } },
    { "$project": {
        "first_name": "$name.first", 
        "last_name": "$name.last",
        "award1": "$awards",
        "award2": "$awards"
    }},
    { "$unwind": "$award1" },
    { "$match": {"award1.award" : "Turing Award" }},
    { "$unwind": "$award2" },
    { "$redact": {
        "$cond": {
             "if": { "$gt": [ "$award1.year", "$award2.year"] },
             "then": "$$KEEP",
             "else": "$$PRUNE"
        }
    }},
    { "$group": {
        "_id": "$_id",
        "first_name": { "$first": "$first_name" },
        "last_name": { "$first": "$last_name" },
        "award1": { "$first": "$award1" },
        "award2": { "$push": "$award2" }
    }}
])

But then again, there is all that "array duplication" and the "cost of unwind" associated with all the operations here. So either of the first two approaches is what you really want in order to avoid that.

like image 124
Blakes Seven Avatar answered Sep 23 '22 08:09

Blakes Seven