I am trying to formulate a query over the sample bios collection http://docs.mongodb.org/manual/reference/bios-example-collection/:
Retrieve all persons and their awards that they received before receiving a Turing award.
I have come up with this query:
db.bios.aggregate([
{$match: {"awards.award" : "Turing Award"}},
{$project: {"award1": "$awards", "award2": "$awards", "first_name": "$name.first", "last_name": "$name.last"}},
{$unwind: "$award1"},
{$match: {"award1.award" : "Turing Award"}},
{$unwind: "$award2"},
{$redact: {
$cond: {
if: { $eq: [ { $gt: [ "$award1.year", "$award2.year"] }, true]},
then: "$$KEEP",
else: "$$PRUNE"
}
}
}
])
And this is the answer:
/* 0 */
{
"result" : [
{
"_id" : 1,
"award1" : {
"award" : "Turing Award",
"year" : 1977,
"by" : "ACM"
},
"award2" : {
"award" : "W.W. McDowell Award",
"year" : 1967,
"by" : "IEEE Computer Society"
},
"first_name" : "John",
"last_name" : "Backus"
},
{
"_id" : 1,
"award1" : {
"award" : "Turing Award",
"year" : 1977,
"by" : "ACM"
},
"award2" : {
"award" : "National Medal of Science",
"year" : 1975,
"by" : "National Science Foundation"
},
"first_name" : "John",
"last_name" : "Backus"
},
{
"_id" : 4,
"award1" : {
"award" : "Turing Award",
"year" : 2001,
"by" : "ACM"
},
"award2" : {
"award" : "Rosing Prize",
"year" : 1999,
"by" : "Norwegian Data Association"
},
"first_name" : "Kristen",
"last_name" : "Nygaard"
},
{
"_id" : 5,
"award1" : {
"award" : "Turing Award",
"year" : 2001,
"by" : "ACM"
},
"award2" : {
"award" : "Rosing Prize",
"year" : 1999,
"by" : "Norwegian Data Association"
},
"first_name" : "Ole-Johan",
"last_name" : "Dahl"
}
],
"ok" : 1
}
What I don't like about this solution is that I unwind $award2
. Instead, I would be happy to keep award2 as an array, and only remove those awards that were received after award1. So, for instance, the answer for John Backus should be:
{
"_id" : 1,
"first_name" : "John",
"last_name" : "Backus",
"award1" : {
"award" : "Turing Award",
"year" : 1977,
"by" : "ACM"
},
"award2" : [
{
"award" : "W.W. McDowell Award",
"year" : 1967,
"by" : "IEEE Computer Society"
},
{
"award" : "National Medal of Science",
"year" : 1975,
"by" : "National Science Foundation"
}
]
}
Is it possible to achieve it with $redact
without doing $unwind: "$award2"
?
Filter MongoDB Array Element Using $Filter Operator This operator uses three variables: input – This represents the array that we want to extract. cond – This represents the set of conditions that must be met. as – This optional field contains a name for the variable that represent each element of the input array.
Description. The MongoDB $unwind stages operator is used to deconstructing an array field from the input documents to output a document for each element. Every output document is the input document with the value of the array field replaced by the element.
Definition. $redact. Restricts the contents of the documents based on information stored in the documents themselves.
It might have been a little more helpful if you had included the original state of the document as an example in your question as this clearly shows "where you are coming from" and then to "where you want to get to" as a goal in addition to your desired output as given.
That's just a tip, but it seems that you are starting with a document like this:
{
"_id" : 1,
"name": {
"first" : "John",
"last" : "Backus"
},
"awards" : [
{
"award" : "W.W. McDowell Award",
"year" : 1967,
"by" : "IEEE Computer Society"
},
{
"award" : "National Medal of Science",
"year" : 1975,
"by" : "National Science Foundation"
},
{
"award" : "Turing Award",
"year" : 1977,
"by" : "ACM"
},
{
"award" : "Some other award",
"year" : 1979,
"by" : "Someone Else"
}
]
}
So the real points here is that while you may haved reached for $redact
here (and it is a bit nicer than using $project
for a logical condition and then using $match
to filter that logical match ) this probably isn't the best tool for the comparison you want to do here.
Before moving on I just want to point out the main problem here with $redact
. Whatever you do here the logic ( without an unwind ) would be essentially to compare "directly" on $$DESCEND
in order to process the the array elements on the value of "year" at whatever level.
That recursion is going to invalidate the "award1" condition as well since it has the same field name. Even renaming that field kills the logic since a projected value where it was missing would not be greater than the tested value.
In a nutshell, $redact
is ruled right out since you cannot say "take from here only" with the logic it applies.
The alternate is to use $map
and $setDifference
to filter contents from the arrays as follows:
db.bios.aggregate([
{ "$match": { "awards.award": "Turing Award" } },
{ "$project": {
"first_name": "$name.first",
"last_name": "$name.last",
"award1": { "$setDifference": [
{ "$map": {
"input": "$awards",
"as": "a",
"in": { "$cond": [
{ "$eq": [ "$$a.award", "Turing Award" ] },
"$$a",
false
]}
}},
[false]
]},
"award2": { "$setDifference": [
{ "$map": {
"input": "$awards",
"as": "a",
"in": { "$cond": [
{ "$ne": [ "$$a.award", "Turing Award" ] },
"$$a",
false
]}
}},
[false]
]}
}},
{ "$unwind": "$award1" },
{ "$project": {
"first_name": 1,
"last_name": 1,
"award1": 1,
"award2": { "$setDifference": [
{ "$map": {
"input": "$award2",
"as": "a",
"in": { "$cond": [
{ "$gt": [ "$award1.year", "$$a.year" ] },
"$$a",
false
]}
}},
[false]
]}
}}
])
And there really is no "pretty" way of getting around either the usage of $unwind
in the itermediatary stage or even the second $project
here, since $map
( and the $setDifference
filter ) returns what is "still an array". So the $unwind
is necessary to make the "array" a singular ( provided your condition only matches 1 element ) entry for which to use in comparison.
Trying to "squish" all the logic in a single $project
will only result in "arrays of arrays" in the second output, and still some "unwinding" therefore required, but at least this way unwinding the (hopefully) 1 match is not really that costly and keeps the output clean.
But the other thing to really note here is that you are not really "aggregating" anything here at all. This is just document manipulation, so you might well consider to just do that manipulation directly in client code. As demonstrated with this shell example:
db.bios.find(
{ "awards.award": "Turing Award" },
{ "name": 1, "awards": 1 }
).forEach(function(doc) {
doc.first_name = doc.name.first;
doc.last_name = doc.name.last;
doc.award1 = doc.awards.filter(function(award) {
return award.award == "Turing Award"
})[0];
doc.award2 = doc.awards.filter(function(award) {
return doc.award1.year > award.year;
});
delete doc.name;
delete doc.awards;
printjson(doc);
})
At any rate, both approaches will output the same:
{
"_id" : 1,
"first_name" : "John",
"last_name" : "Backus",
"award1" : {
"award" : "Turing Award",
"year" : 1977,
"by" : "ACM"
},
"award2" : [
{
"award" : "W.W. McDowell Award",
"year" : 1967,
"by" : "IEEE Computer Society"
},
{
"award" : "National Medal of Science",
"year" : 1975,
"by" : "National Science Foundation"
}
]
}
The only real difference here is that by using .aggregate()
the content of "award2" will already be filtered when returned from the server, which probably isn't going to be that much different from doing the client processing approach unless the items that would be removed comprises a reasonably large list per document.
For the record, the only alteration to your existing aggregation pipeline really required here would be to add a $group
to the end to "re-combine" the array entries into a single document:
db.bios.aggregate([
{ "$match": { "awards.award": "Turing Award" } },
{ "$project": {
"first_name": "$name.first",
"last_name": "$name.last",
"award1": "$awards",
"award2": "$awards"
}},
{ "$unwind": "$award1" },
{ "$match": {"award1.award" : "Turing Award" }},
{ "$unwind": "$award2" },
{ "$redact": {
"$cond": {
"if": { "$gt": [ "$award1.year", "$award2.year"] },
"then": "$$KEEP",
"else": "$$PRUNE"
}
}},
{ "$group": {
"_id": "$_id",
"first_name": { "$first": "$first_name" },
"last_name": { "$first": "$last_name" },
"award1": { "$first": "$award1" },
"award2": { "$push": "$award2" }
}}
])
But then again, there is all that "array duplication" and the "cost of unwind" associated with all the operations here. So either of the first two approaches is what you really want in order to avoid that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With