Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mongodb Aggregation Framework | Group over multiple values?

I would like to use mongoDB's Aggregation Framework to run what in SQL would look a bit like:

SELECT SUM(A), B, C from myTable GROUP BY B, C; 

The docs state:

You can specify a single field from the documents in the pipeline, a previously computed value, or an aggregate key made up from several incoming fields.

But it's unclear what 'an aggregate key made from several incoming fields' actually is?

My dataset is a bit like this:

[{ "timeStamp" : 1341834988666, "label" : "sharon", "responseCode" : "200", "value" : 10, "success" : "true"}, { "timeStamp" : 1341834988676, "label" : "paul", "responseCode" : "200", "value" : 60, "success" : "true"}, { "timeStamp" : 1341834988686, "label" : "paul", "responseCode" : "404", "value" : 15, "success" : "true"}, { "timeStamp" : 1341834988696, "label" : "sharon", "responseCode" : "200", "value" : 35, "success" : "false"}, { "timeStamp" : 1341834988166, "label" : "paul", "responseCode" : "200", "value" : 40, "success" : "true"}, { "timeStamp" : 1341834988266, "label" : "paul", "responseCode" : "404", "value" : 99, "success" : "false"}] 

My query looks like this:

resultsCollection.aggregate(     { $match : { testid : testid} },     { $skip : alreadyRead },     { $project : {             timeStamp : 1 ,             label : 1,             responseCode : 1 ,             value : 1,             success : 1         }},     { $group : {             _id : "$label",             max_timeStamp : { $timeStamp : 1 },             count_responseCode : { $sum : 1 },             avg_value : { $sum : "$value" },             count_success : { $sum : 1 }         }},     { $group : {             ?         }} ); 

My instinct was to try to pipe the results through to a second group, I know you can do this but it won't work because the first group already reduces the dataset too much and the required level of detail is lost.

What I want to do is group using label, responseCode and success and get the sum of value from the result. It should look a bit like:

label   | code | success | sum_of_values | count sharon  | 200  |  true   |      10       |   1 sharon  | 200  |  false  |      35       |   1 paul    | 200  |  true   |      100      |   2 paul    | 404  |  true   |      15       |   1 paul    | 404  |  false  |      99       |   1 

Where there are five groups:

1. { "timeStamp" : 1341834988666, "label" : "sharon", "responseCode" : "200", "value" : 10, "success" : "true"}  2. { "timeStamp" : 1341834988696, "label" : "sharon", "responseCode" : "200", "value" : 35, "success" : "false"}  3. { "timeStamp" : 1341834988676, "label" : "paul", "responseCode" : "200", "value" : 60, "success" : "true"}    { "timeStamp" : 1341834988166, "label" : "paul", "responseCode" : "200", "value" : 40, "success" : "true"}  4. { "timeStamp" : 1341834988686, "label" : "paul", "responseCode" : "404", "value" : 15, "success" : "true"}  5. { "timeStamp" : 1341834988266, "label" : "paul", "responseCode" : "404", "value" : 99, "success" : "false"} 
like image 757
Oliver Lloyd Avatar asked Jul 10 '12 17:07

Oliver Lloyd


People also ask

Can we use multiple group in MongoDB?

In the MongoDB database, group by is used to group the data from the collection. We can also use the aggregation function as well and group the method. The aggregate function is used in multiple conditions. We can group by single as well as multiple fields from the collection.

Can we specify more than one aggregate function simultaneously in MongoDB?

collection. aggregate () can use several channels at the same time for data processing. Db. collection.

How can you group by a particular value in MongoDB?

We can group by single as well as multiple field from the collection, we can use $group operator in MongoDB to group fields from the collection and returns the new document as result. We are using $avg, $sum, $max, $min, $push, $last, $first and $addToSet operator with group by in MongoDB.


1 Answers

OK, so the solution is to specify an aggregate key for the _id value. This is documented here as:

You can specify a single field from the documents in the pipeline, a previously computed value, or an aggregate key made up from several incoming fields.

But it doesn't actually define the format for an aggregate key. Reading the earlier documentation here I saw that the previous collection.group method could take multiple fields and that the same structure is used in the new framework.

So, to group over multiple fields you could use _id : { success:'$success', responseCode:'$responseCode', label:'$label'}

As in:

resultsCollection.aggregate( { $match : { testid : testid} }, { $skip : alreadyRead }, { $project : {         timeStamp : 1 ,         label : 1,         responseCode : 1 ,         value : 1,         success : 1     }}, { $group : {         _id :  { success:'$success', responseCode:'$responseCode', label:'$label'},         max_timeStamp : { $timeStamp : 1 },         count_responseCode : { $sum : 1 },         avg_value : { $sum : "$value" },         count_success : { $sum : 1 }     }} ); 
like image 147
Oliver Lloyd Avatar answered Sep 17 '22 18:09

Oliver Lloyd