In my MongoDB people collection I need to filter people with the same 'alias' property value, keeping the first one of them, and also keeping all people with a null 'alias'.
Data is like this:
{ "_id" : "1", "flag" : true, "name" : "Alice", "alias" : null, "dateOfBirth": new ISODate('1995-12-27T00:00:00.000Z') },
{ "_id" : "2", "flag" : true, "name" : "Bob", "alias" : "4c", "dateOfBirth": new ISODate('1996-12-27T00:00:00.000Z') },
{ "_id" : "3", "flag" : true, "name" : "Bobby", "alias" : "4c", "dateOfBirth": new ISODate('1997-12-27T00:00:00.000Z') },
{ "_id" : "4", "flag" : true, "name" : "Cristina", "alias" : null, "dateOfBirth": new ISODate('1998-12-27T00:00:00.000Z') },
{ "_id" : "5", "flag" : false, "name" : "Diego", "alias" : null, "dateOfBirth": new ISODate('1999-12-27T00:00:00.000Z') },
{ "_id" : "6", "flag" : true, "name" : "Zoe", "alias" : "22", "dateOfBirth": new ISODate('2000-12-27T00:00:00.000Z') }
This is my query:
db.people.aggregate([
{ '$match': { 'flag': true } },
{ '$project': {
'name': 1,
'alias': {
'$cond': [
{ '$eq': [ '$alias', null ] },
'$_id',
'$alias'
]
}
}},
{ '$group': {
'_id': '$alias',
'name': { '$first': '$name' },
'id': { '$first': '$_id' }
}},
{ '$project': {
'alias': {
'$cond': [
{ '$eq': [ '$id', '$_id' ] },
null,
'$_id'
]
},
'name': 1,
'_id': '$id'
}}
])
Which returns:
{ "_id" : "6", "name" : "Zoe", "alias" : "22" }
{ "_id" : "4", "name" : "Cristina", "alias" : null }
{ "_id" : "2", "name" : "Bob", "alias" : "4c" }
{ "_id" : "1", "name" : "Alice", "alias" : null }
So far so good.
UPDATE: Up to this point the qestion is similar to suggested duplicate question. Now comes the different part:
I need to sort it according to "dateOfBirth" field.
Very confident, I changed the query to:
db.people.aggregate([
{ '$match': { 'flag': true } },
{ '$project': {
'name': 1,
'dateOfBirth': 1,
'alias': {
'$cond': [
{ '$eq': [ '$alias', null ] },
'$_id',
'$alias'
]
}
}},
{ '$group': {
'_id': '$alias',
'name': { '$first': '$name' },
'dateOfBirth': { '$first': '$dateOfBirth' },
'id': { '$first': '$_id' }
}},
{ '$project': {
'alias': {
'$cond': [
{ '$eq': [ '$id', '$_id' ] },
null,
'$_id'
]
},
'name': 1,
'_id': '$id',
'dateOfBirth': 1,
}},
{ '$sort': { 'dateOfBirth': 1 }}
])
But this gives:
{ "_id" : "1", "name" : "Alice", "dateOfBirth" : ISODate("1995-12-27T00:00:00Z"), "alias" : null }
{ "_id" : "6", "name" : "Zoe", "dateOfBirth" : null, "alias" : "22" }
{ "_id" : "4", "name" : "Cristina", "dateOfBirth" : null, "alias" : null }
{ "_id" : "2", "name" : "Bob", "dateOfBirth" : null, "alias" : "4c" }
Which is wrong, of course: dateOfBirth field does not pass the $group stage...
Any clue on how to let it pass through?
Actually that works just fine for me, and I suspect you have a typo in the real code you are running ( not that listing ) that was missing the inclusion of "dateOfBirth" somewhere ( or possibly mispelled ).
But if there is a lesson to be learned here, it should be to not seperate $project and $group stages as it's not only inefficient to introduce another pipeline stage ( this means an extra pass through data ) but it a common cause for confusion when specifying data to include in the pipeline.
So rather do:
db.people.aggregate([
{ "$match": { "flag": true } },
{ "$group": {
"_id": {
"$ifNull": [ "$alias", "$_id" ]
},
"name": { "$first": "$name" },
"dateOfBirth": { "$first": "$dateOfBirth" },
"id": { "$first": "$_id" }
}},
{ "$project": {
"_id": "$id",
"name": 1,
"dateOfBirth": 1,
"alias": {
"$cond": [
{ "$eq": [ "$_id", "$id" ] },
null,
"$_id"
]
}
}},
{ "$sort": { "dateOfBirth": 1 } }
])
Which also makes use of $ifNull as a natural test rather than working with $cond where you do not need to.
And of course the desired result is returned:
{ "_id" : "1", "name" : "Alice", "dateOfBirth" : ISODate("1995-12-27T00:00:00Z"), "alias" : null }
{ "_id" : "2", "name" : "Bob", "dateOfBirth" : ISODate("1996-12-27T00:00:00Z"), "alias" : "4c" }
{ "_id" : "4", "name" : "Cristina", "dateOfBirth" : ISODate("1998-12-27T00:00:00Z"), "alias" : null }
{ "_id" : "6", "name" : "Zoe", "dateOfBirth" : ISODate("2000-12-27T00:00:00Z"), "alias" : "22" }
If you wanted "first by date of birth", then move the sort before the $group stage and the $first operator will do all the work there.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With