I'm having a dataset which contains datapoints for every 5 seconds per day. This would result in a dataset of 17280 items a day. This set is way too big and i want it smaller (i'm using these items to draw a graph).
Since the graph's x-axis is over time i decided a gap of 5 minutes per datapoint is good enough. This will return into 288 datapoints a day. A lot less and good enough to make a graph.
My MongoCollection looks like this:
{
"timestamp":"12323455",
"someKey":123,
"someOtherKey": 345,
"someOtherOtherKey": 6789
}
The data gets posted every 5 seconds into the database. So the timestamp will differ 5 seconds for each result.
As my x-axis is divided in 5 minutes sequences I'd love to calculate the average values of someKey
, someOtherKey
and someOtherOtherkey
over these 5 minutes.
This new average will be one of the datapoints in my graph.
How would one get all the datapoints from 1 day with each average 5 minutes apart from eachother? (288 datapoints per day).
As for now i'm selecting every document from midnight this day:
$result = $collection
->createQueryBuilder()
->field('timestamp')->gte($todayMidnight)
->sort('timestamp', 'DSC')
->getQuery()
->execute();
How would one filter this list of data (within the same query) to get the datapoints for every 5 minutes (and the datapoint being an average of the points within these 5 minutes)?
It would be nice to have this query built with doctrine as i'll need it in my symfony application.
EDIT I've tried to get my query first within the mongoshell working. As in the comments suggested i should start using aggregation.
The query i've made so far is based upon another question asked here at stackoverflow
This is the current query:
db.Pizza.aggregate([
{
$match:
{
timestamp: {$gte: 1464559200}
}
},
{
$group:
{
_id:
{
$subtract: [
"$timestamp",
{"$mod": ["$timestamp", 300]}
]
},
"timestamp":{"$first":"$timestamp"},
"someKey":{"$first":"$someKey"},
"someOtherKey":{"$first":"$someOtherKey"},
"someOtherOtherKey":{"$first":"$someOtherOtherKey"}
}
}
])
This query will give me the last result for each 300 seconds (5 minutes) from today Midnight.
I want it to get all documents within those 300 seconds and calculate an average over the columns someKey
, someOtherKey
, someOtherOtherKey
So if we take this example dataset:
{
"timestamp":"1464559215",
"someKey":123,
"someOtherKey": 345,
"someOtherOtherKey": 6789
},
{
"timestamp":"1464559220",
"someKey":54,
"someOtherKey": 20,
"someOtherOtherKey": 511
},
{
"timestamp":"1464559225",
"someKey":654,
"someOtherKey": 10,
"someOtherOtherKey": 80
},
{
"timestamp":"1464559505",
"someKey":90,
"someOtherKey": 51,
"someOtherOtherKey": 1
}
The query should return 2 rows namely:
{
"timestamp":"1464559225",
"someKey":277,
"someOtherKey": 125,
"someOtherOtherKey": 2460
},
{
"timestamp":"1464559505",
"someKey":90,
"someOtherKey": 51,
"someOtherOtherKey": 1
}
The first result is calculated like this:
Result 1 - someKey = (123+54+654)/3 = 277
Result 1 - someOtherKey = (345+20+10)/3 = 125
Result 1 - someOtherOtherKey = (6789+511+80)/3 = 2460
How would one make this calculation within the mongoshell with the aggregation function?
Based on the given answeres here on stackoverflow i've managed to get exactly what i wanted.
This is the big aggregation query i have to make to get all my results back:
db.Pizza.aggregate([
{
$match:
{
timestamp: {$gte: 1464559200}
}
},
{
$group:
{
_id:
{
$subtract: [
'$timestamp',
{$mod: ['$timestamp', 300]}
]
},
timestamp: {$last: '$timestamp'},
someKey: {$avg: '$someKey'},
someOtherKey: {$avg: '$someOtherKey'},
someOtherOtherKey: {$avg: '$someOtherOtherKey'}
}
},
{
$project:
{
_id: 0,
timestamp: '$timestamp',
someKey: '$someKey',
someOtherKey:'$someOtherKey',
someOtherOtherKey:'$someOtherOtherKey'
}
}
])
The Match part is for getting every result after Today Midnight (timestamp of today midnight).
The Group part is the most interesting part. Here we're looping through every document we've found and calculate a modulus for every 300 seconds (5 minutes) then we fill the property timestamp with the last result of the modulus operations.
The Project part is necessary to remove the _id from the actual result as the result doesn't represent something in the database anymore.
Given answeres where this answere is based on:
MongoDB - Aggregate max/min/average for multiple variables at once
How to subtract in mongodb php
MongoDB : Aggregation framework : Get last dated document per grouping ID
Doctrine Solution
$collection->aggregate([
[
'$match' => [
'timestamp' => ['$gte' => 1464559200]
]
],
[
'$group' => [
'_id' => [
'$subtract' => [
'$timestamp',
[
'$mod' => ['$timestamp',300]
]
]
],
'timestamp' => [
'$last' => '$timestamp'
],
$someKey => [
'$avg' => '$'.$someKey
],
$someOtherKey => [
'$avg' => '$'.$someOtherKey
],
$someOtherOtherKey => [
'$avg' => '$'.$someOtherOtherKey
]
]
]
]);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With