I have some log data stored in a mongo collection that includes basic information as a request_id and the time it was added to the collection, for example:
{
"_id" : ObjectId("55ae6ea558a5d3fe018b4568"),
"request_id" : "030ac9f1-aa13-41d1-9ced-2966b9a6g5c3",
"time" : ISODate("2015-07-21T16:00:00.00Z")
}
I was wondering if I could use the aggregation framework to aggregate some statistical data. I would like to get the counts of the objects created within each interval of N minutes for the last X hours.
So the output which I need for 10 minutes intervals for the last 1 hour should be something like the following:
{ "_id" : 0, "time" : ISODate("2015-07-21T15:00:00.00Z"), "count" : 67 }
{ "_id" : 0, "time" : ISODate("2015-07-21T15:10:00.00Z"), "count" : 113 }
{ "_id" : 0, "time" : ISODate("2015-07-21T15:20:00.00Z"), "count" : 40 }
{ "_id" : 0, "time" : ISODate("2015-07-21T15:30:00.00Z"), "count" : 10 }
{ "_id" : 0, "time" : ISODate("2015-07-21T15:40:00.00Z"), "count" : 32 }
{ "_id" : 0, "time" : ISODate("2015-07-21T15:50:00.00Z"), "count" : 34 }
I would use that to get data for graphs.
Any advice is appreciated!
There are a couple of ways of approaching this depending on which output format best suits your needs. The main note is that with the "aggregation framework" itself, you cannot actually return something "cast" as a date, but you can get values that are easily reconstructed into a Date
object when processing results in your API.
The first approach is to use the "Date Aggregation Operators" available to the aggregation framework:
db.collection.aggregate([
{ "$match": {
"time": { "$gte": startDate, "$lt": endDate }
}},
{ "$group": {
"_id": {
"year": { "$year": "$time" },
"dayOfYear": { "$dayOfYear": "$time" },
"hour": { "$hour": "$time" },
"minute": {
"$subtract": [
{ "$minute": "$time" },
{ "$mod": [ { "$minute": "$time" }, 10 ] }
]
}
},
"count": { "$sum": 1 }
}}
])
Which returns a composite key for _id
containing all the values you want for a "date". Alternately if just within an "hour" always then just use the "minute" part and work out the actual date based on the startDate
of your range selection.
Or you can just use plain "Date math" to get the milliseconds since "epoch" which can again be fed to a date contructor directly.
db.collection.aggregate([
{ "$match": {
"time": { "$gte": startDate, "$lt": endDate }
}},
{ "$group": {
"_id": {
"$subtract": [
{ "$subtract": [ "$time", new Date(0) ] },
{ "$mod": [
{ "$subtract": [ "$time", new Date(0) ] },
1000 * 60 * 10
]}
]
},
"count": { "$sum": 1 }
}}
])
In all cases what you do not want to do is use $project
before actually applying $group
. As a "pipeline stage", $project
must "cycle" though all documents selected and "transform" the content.
This takes time, and adds to the execution total of the query. You can simply just apply to the $group
directly as has been shown.
Or if you are really "pure" about a Date
object being returned without post processing, then you can always use "mapReduce", since the JavaScript functions actually allow recasting as a date, but slower than the aggregation framework and of course without a cursor response:
db.collection.mapReduce(
function() {
var date = new Date(
this.time.valueOf()
- ( this.time.valueOf() % ( 1000 * 60 * 10 ) )
);
emit(date,1);
},
function(key,values) {
return Array.sum(values);
},
{ "out": { "inline": 1 } }
)
Your best bet is using aggregation though, as transforming the response is quite easy:
db.collection.aggregate([
{ "$match": {
"time": { "$gte": startDate, "$lt": endDate }
}},
{ "$group": {
"_id": {
"year": { "$year": "$time" },
"dayOfYear": { "$dayOfYear": "$time" },
"hour": { "$hour": "$time" },
"minute": {
"$subtract": [
{ "$minute": "$time" },
{ "$mod": [ { "$minute": "$time" }, 10 ] }
]
}
},
"count": { "$sum": 1 }
}}
]).forEach(function(doc) {
doc._id = new Date(doc._id);
printjson(doc);
})
And then you have your interval grouping output with real Date
objects.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With