Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculating count and average with MongoDB aggregation

I have a simple db layout like this:

client
    id
    sex (male/female)
    birthday (date)    

client
    id
    sex (male/female)
    birthday (date)  

(...)

I'm trying to write an aggregation command that outputs how many male and female clients I've got, and I'd also like to output the average age of males and females, not sure I can do this in the same command or I need 2 separate ones?

// Count of males/females, average age
Clients.aggregate({
    $project : {"sex"      : 1,
            "sexCount" : 1,
            "birthday" : 1,
            "avgAge"   : 1
               } 
    },
    {
        $match: {"sex": {$exists: true}} 
    },
    {
        $group: {
                    _id      : "$sex",
            sexCount : { $sum: 1 },
            avgAge   : { $avg: "$birthday" },
            }
    },
    { $sort: { _id: 1 } }
    , function(err, sex_dbres) {
            if (err)
                throw err;
            else{
                (...)
            }
        });         

With the code above I get the counts of male/female, but avgAge comes as 0. Any ideas?

Many thanks

like image 306
Rafa Llorente Avatar asked Oct 20 '12 16:10

Rafa Llorente


2 Answers

The answer would be much simpler if you were storing age in the original document (as Dmitry posted, you could just do a straight avgAge:{$avg:"$age"} in your $group step.

Aggregation Framework is pretty nifty though and has many cool operators which allow you to compute this missing age field "on the fly".

I'm going to store each step of the aggregation in a variable so it's easier to see what's going on:

today = new Date();
// split today and bday into numerical year and numerical day-of-the-year
project1= {
    "$project" : {
        "sex" : 1,
        "todayYear" : {
            "$year" : today
        },
        "todayDay" : {
            "$dayOfYear" : today
        },
        "by" : {
            "$year" : "$bday"
        },
        "bd" : {
            "$dayOfYear" : "$bday"
        }
    }
};
// calculate age in days by subtracting bday in days from today in days
project2 = {
    "$project" : {
        "sex" : 1,
        "age" : {
            "$subtract" : [
                {
                    "$add" : [
                        {
                            "$multiply" : [
                                "$todayYear",
                                365
                            ]
                        },
                        "$todayDay"
                    ]
                },
                {
                    "$add" : [
                        {
                            "$multiply" : [
                                "$by",
                                365
                            ]
                        },
                        "$bd"
                    ]
                }
            ]
        }
    }
};
// sum up for each sex the count and compute avg age (in days)
group = {
    "$group" : {
        "_id" : "$sex",
        "total" : {
            "$sum" : 1
        },
        "avgAge" : {
            "$avg" : "$age"
        }
    }
};
// divide days by 365 to get age in years.
project3 = {
    "$project" : {
        "_id" : 0,
        "sex" : "$_id",
        "total" : 1,
        "averageAge" : {
            "$divide" : [
                "$avgAge",
                365
            ]
        }
    }
};

Now you can run the aggregation:

> db.client.find({},{_id:0})
{ "sex" : "male", "bday" : ISODate("2000-02-02T08:00:00Z") }
{ "sex" : "male", "bday" : ISODate("1987-02-02T08:00:00Z") }
{ "sex" : "female", "bday" : ISODate("1989-02-02T08:00:00Z") }
{ "sex" : "female", "bday" : ISODate("1993-11-02T08:00:00Z") }
> db.client.aggregate([ project1, project2, group, project3 ])
{
    "result" : [
        {
            "sex" : "female",
            "total" : 2,
            "averageAge" : 21.34109589041096
        },
        {
            "sex" : "male",
            "total" : 2,
            "averageAge" : 19.215068493150685
        }
    ],
    "ok" : 1
}
> 

The reason this is not simple is currently Aggregation Framework does not support direct subtraction of dates. Please vote for https://jira.mongodb.org/browse/SERVER-6239 which is targeted for the next major release - once it's implemented it should allow subtraction of dates directly (though you will still need to convert it to appropriate granularity, years in this case probably).

like image 133
Asya Kamsky Avatar answered Oct 02 '22 02:10

Asya Kamsky


The date object can't be "averaged", but numbers can. You can convert your dates to the timestamp value, and then find average from it. But still that won't be an average age, you'll need to subtract result from the current date outside of the aggregation function.

Another option is to assume that age can be calculated using only year part of the date (that is, if I was born on December 1, 2000, in today's report I'll be 12 years old, not 11). In this case you can use date operators to extract year value.

$project : {"sex"      : 1,
            "sexCount" : 1,
            "year" : {$year: "$birthday"},
           } 
},
$project : {"sex"      : 1,
            "sexCount" : 1,
            "age" : {$subtract: [2012, '$year']},
           } 
},
like image 33
Dmitry Avatar answered Oct 02 '22 00:10

Dmitry