I would like to use mongoDB's Aggregation Framework to run what in SQL would look a bit like:
SELECT SUM(A), B, C from myTable GROUP BY B, C;
The docs state:
You can specify a single field from the documents in the pipeline, a previously computed value, or an aggregate key made up from several incoming fields.
But it's unclear what 'an aggregate key made from several incoming fields' actually is?
My dataset is a bit like this:
[{ "timeStamp" : 1341834988666, "label" : "sharon", "responseCode" : "200", "value" : 10, "success" : "true"}, { "timeStamp" : 1341834988676, "label" : "paul", "responseCode" : "200", "value" : 60, "success" : "true"}, { "timeStamp" : 1341834988686, "label" : "paul", "responseCode" : "404", "value" : 15, "success" : "true"}, { "timeStamp" : 1341834988696, "label" : "sharon", "responseCode" : "200", "value" : 35, "success" : "false"}, { "timeStamp" : 1341834988166, "label" : "paul", "responseCode" : "200", "value" : 40, "success" : "true"}, { "timeStamp" : 1341834988266, "label" : "paul", "responseCode" : "404", "value" : 99, "success" : "false"}]
My query looks like this:
resultsCollection.aggregate( { $match : { testid : testid} }, { $skip : alreadyRead }, { $project : { timeStamp : 1 , label : 1, responseCode : 1 , value : 1, success : 1 }}, { $group : { _id : "$label", max_timeStamp : { $timeStamp : 1 }, count_responseCode : { $sum : 1 }, avg_value : { $sum : "$value" }, count_success : { $sum : 1 } }}, { $group : { ? }} );
My instinct was to try to pipe the results through to a second group, I know you can do this but it won't work because the first group already reduces the dataset too much and the required level of detail is lost.
What I want to do is group using label
, responseCode
and success
and get the sum of value from the result. It should look a bit like:
label | code | success | sum_of_values | count sharon | 200 | true | 10 | 1 sharon | 200 | false | 35 | 1 paul | 200 | true | 100 | 2 paul | 404 | true | 15 | 1 paul | 404 | false | 99 | 1
Where there are five groups:
1. { "timeStamp" : 1341834988666, "label" : "sharon", "responseCode" : "200", "value" : 10, "success" : "true"} 2. { "timeStamp" : 1341834988696, "label" : "sharon", "responseCode" : "200", "value" : 35, "success" : "false"} 3. { "timeStamp" : 1341834988676, "label" : "paul", "responseCode" : "200", "value" : 60, "success" : "true"} { "timeStamp" : 1341834988166, "label" : "paul", "responseCode" : "200", "value" : 40, "success" : "true"} 4. { "timeStamp" : 1341834988686, "label" : "paul", "responseCode" : "404", "value" : 15, "success" : "true"} 5. { "timeStamp" : 1341834988266, "label" : "paul", "responseCode" : "404", "value" : 99, "success" : "false"}
In the MongoDB database, group by is used to group the data from the collection. We can also use the aggregation function as well and group the method. The aggregate function is used in multiple conditions. We can group by single as well as multiple fields from the collection.
collection. aggregate () can use several channels at the same time for data processing. Db. collection.
We can group by single as well as multiple field from the collection, we can use $group operator in MongoDB to group fields from the collection and returns the new document as result. We are using $avg, $sum, $max, $min, $push, $last, $first and $addToSet operator with group by in MongoDB.
OK, so the solution is to specify an aggregate key for the _id value. This is documented here as:
You can specify a single field from the documents in the pipeline, a previously computed value, or an aggregate key made up from several incoming fields.
But it doesn't actually define the format for an aggregate key. Reading the earlier documentation here I saw that the previous collection.group method could take multiple fields and that the same structure is used in the new framework.
So, to group over multiple fields you could use _id : { success:'$success', responseCode:'$responseCode', label:'$label'}
As in:
resultsCollection.aggregate( { $match : { testid : testid} }, { $skip : alreadyRead }, { $project : { timeStamp : 1 , label : 1, responseCode : 1 , value : 1, success : 1 }}, { $group : { _id : { success:'$success', responseCode:'$responseCode', label:'$label'}, max_timeStamp : { $timeStamp : 1 }, count_responseCode : { $sum : 1 }, avg_value : { $sum : "$value" }, count_success : { $sum : 1 } }} );
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With