Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDb: How to aggregate, group and sort on date field?

In my MongoDB people collection I need to filter people with the same 'alias' property value, keeping the first one of them, and also keeping all people with a null 'alias'.

Data is like this:

{ "_id" : "1", "flag" : true,  "name" : "Alice",    "alias" : null, "dateOfBirth": new ISODate('1995-12-27T00:00:00.000Z') },
{ "_id" : "2", "flag" : true,  "name" : "Bob",      "alias" : "4c", "dateOfBirth": new ISODate('1996-12-27T00:00:00.000Z') },
{ "_id" : "3", "flag" : true,  "name" : "Bobby",    "alias" : "4c", "dateOfBirth": new ISODate('1997-12-27T00:00:00.000Z') },
{ "_id" : "4", "flag" : true,  "name" : "Cristina", "alias" : null, "dateOfBirth": new ISODate('1998-12-27T00:00:00.000Z') },
{ "_id" : "5", "flag" : false, "name" : "Diego",    "alias" : null, "dateOfBirth": new ISODate('1999-12-27T00:00:00.000Z') },
{ "_id" : "6", "flag" : true,  "name" : "Zoe",      "alias" : "22", "dateOfBirth": new ISODate('2000-12-27T00:00:00.000Z') }

This is my query:

db.people.aggregate([ 
    { '$match': { 'flag': true } }, 
    { '$project': {
        'name': 1,          
        'alias': { 
            '$cond': [
                { '$eq': [ '$alias', null ] }, 
                '$_id', 
                '$alias' 
            ]
        }
    }},
    { '$group': {
        '_id': '$alias',         
        'name':  { '$first': '$name' },          
        'id': { '$first': '$_id' }       
    }}, 
    { '$project': {
        'alias': {
            '$cond': [ 
                { '$eq': [ '$id', '$_id' ] }, 
                null, 
               '$_id' 
            ]
        }, 
        'name': 1,
        '_id': '$id'
    }}
])

Which returns:

{ "_id" : "6", "name" : "Zoe",      "alias" : "22" }
{ "_id" : "4", "name" : "Cristina", "alias" : null }
{ "_id" : "2", "name" : "Bob",      "alias" : "4c" }
{ "_id" : "1", "name" : "Alice",    "alias" : null }

So far so good.

UPDATE: Up to this point the qestion is similar to suggested duplicate question. Now comes the different part:

I need to sort it according to "dateOfBirth" field.
Very confident, I changed the query to:

db.people.aggregate([ 
    { '$match': { 'flag': true } }, 
    { '$project': {
        'name': 1,          
        'dateOfBirth': 1,
        'alias': { 
            '$cond': [
                { '$eq': [ '$alias', null ] }, 
                '$_id', 
                '$alias' 
            ]
        }
    }},
    { '$group': {
        '_id': '$alias',         
        'name':  { '$first': '$name' },          
        'dateOfBirth': { '$first': '$dateOfBirth' },
        'id': { '$first': '$_id' }       
    }}, 
    { '$project': {
        'alias': {
            '$cond': [ 
                { '$eq': [ '$id', '$_id' ] }, 
                null, 
               '$_id' 
            ]
        }, 
        'name': 1,
        '_id': '$id',
        'dateOfBirth': 1,
    }},
    { '$sort': { 'dateOfBirth': 1 }}
])

But this gives:

{ "_id" : "1", "name" : "Alice",    "dateOfBirth" : ISODate("1995-12-27T00:00:00Z"), "alias" : null }
{ "_id" : "6", "name" : "Zoe",      "dateOfBirth" : null, "alias" : "22" }
{ "_id" : "4", "name" : "Cristina", "dateOfBirth" : null, "alias" : null }
{ "_id" : "2", "name" : "Bob",      "dateOfBirth" : null, "alias" : "4c" }

Which is wrong, of course: dateOfBirth field does not pass the $group stage...

Any clue on how to let it pass through?

like image 727
MarcoS Avatar asked Nov 08 '25 03:11

MarcoS


1 Answers

Actually that works just fine for me, and I suspect you have a typo in the real code you are running ( not that listing ) that was missing the inclusion of "dateOfBirth" somewhere ( or possibly mispelled ).

But if there is a lesson to be learned here, it should be to not seperate $project and $group stages as it's not only inefficient to introduce another pipeline stage ( this means an extra pass through data ) but it a common cause for confusion when specifying data to include in the pipeline.

So rather do:

db.people.aggregate([
    { "$match": { "flag": true } },
    { "$group": {
        "_id": {
            "$ifNull": [ "$alias", "$_id" ]
        },
        "name": { "$first": "$name" },
        "dateOfBirth": { "$first": "$dateOfBirth" },
        "id": { "$first": "$_id" }
    }},
    { "$project": {
        "_id": "$id",
        "name": 1,
        "dateOfBirth": 1,
        "alias": {
            "$cond": [
                { "$eq": [ "$_id", "$id" ] },
                null,
                "$_id"
            ]
        }
    }},
    { "$sort": { "dateOfBirth": 1 } }
]) 

Which also makes use of $ifNull as a natural test rather than working with $cond where you do not need to.

And of course the desired result is returned:

{ "_id" : "1", "name" : "Alice", "dateOfBirth" : ISODate("1995-12-27T00:00:00Z"), "alias" : null }
{ "_id" : "2", "name" : "Bob", "dateOfBirth" : ISODate("1996-12-27T00:00:00Z"), "alias" : "4c" }
{ "_id" : "4", "name" : "Cristina", "dateOfBirth" : ISODate("1998-12-27T00:00:00Z"), "alias" : null }
{ "_id" : "6", "name" : "Zoe", "dateOfBirth" : ISODate("2000-12-27T00:00:00Z"), "alias" : "22" }

If you wanted "first by date of birth", then move the sort before the $group stage and the $first operator will do all the work there.

like image 74
Blakes Seven Avatar answered Nov 12 '25 15:11

Blakes Seven



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!