Given a set of questions that have linked survey and category id: <pre class="prettyprint"><code>> db.questions.find().toArray(); [ { "_id" : ObjectId("4fda05bc322b1c95b531ac25"), "id" : 1, "name" : "Question 1", "category_id" : 1, "survey_id" : 1, "score" : 5 }, { "_id" : ObjectId("4fda05cb322b1c95b531ac26"), "id" : 2, "name" : "Question 2", "category_id" : 1, "survey_id" : 1, "score" : 3 }, { "_id" : ObjectId("4fda05d9322b1c95b531ac27"), "id" : 3, "name" : "Question 3", "category_id" : 2, "survey_id" : 1, "score" : 4 }, { "_id" : ObjectId("4fda4287322b1c95b531ac28"), "id" : 4, "name" : "Question 4", "category_id" : 2, "survey_id" : 1, "score" : 7 } ] </code></pre> I can find the category average with: <pre class="prettyprint"><code>db.questions.aggregate( { $group : { _id : "$category_id", avg_score : { $avg : "$score" } } } ); { "result" : [ { "_id" : 1, "avg_score" : 4 }, { "_id" : 2, "avg_score" : 5.5 } ], "ok" : 1 } </code></pre> How can I get the average of category averages (note this is different than simply averaging all questions)? I would assume I would do multiple group operations but this fails: <pre class="prettyprint"><code>> db.questions.aggregate( ... { $group : { ... _id : "$category_id", ... avg_score : { $avg : "$score" }, ... }}, ... { $group : { ... _id : "$survey_id", ... avg_score : { $avg : "$score" }, ... }} ... ); { "errmsg" : "exception: the _id field for a group must not be undefined", "code" : 15956, "ok" : 0 } > </code></pre>

It's important to understand that the operations in the argument to aggregate() form a pipeline. This meant that the input to any element of the pipeline is the stream of documents produced by the previous element in the pipeline. In your example, your first query creates a pipeline of documents that look like this: <pre class="prettyprint"><code>{ "_id" : 2, "avg_score" : 5.5 }, { "_id" : 1, "avg_score" : 4 } </code></pre> This means that the second element of the pipline is seeing a series of documents where the only keys are "_id" and "avg_score". The keys "category_id" and "score" no longer exist in this document stream. If you want to further aggregate on this stream, you'll have to aggregate using the keys that are seen at this stage in the pipeline. Since you want to average the averages, you need to put in a single constant value for the _id field, so that all of the input documents get grouped into a single result. The following code produces the correct result: <pre class="prettyprint"><code>db.questions.aggregate( { $group : { _id : "$category_id", avg_score : { $avg : "$score" }, } }, { $group : { _id : "all", avg_score : { $avg : "$avg_score" }, } } ); </code></pre> When run, it produces the following output: <pre class="prettyprint"><code> { "result" : [ { "_id" : "all", "avg_score" : 4.75 } ], "ok" : 1 } </code></pre>

Multiple group operations using Mongo aggregation framework

Tags:

Given a set of questions that have linked survey and category id:

> db.questions.find().toArray(); [     {         "_id" : ObjectId("4fda05bc322b1c95b531ac25"),         "id" : 1,         "name" : "Question 1",         "category_id" : 1,         "survey_id" : 1,         "score" : 5     },     {         "_id" : ObjectId("4fda05cb322b1c95b531ac26"),         "id" : 2,         "name" : "Question 2",         "category_id" : 1,         "survey_id" : 1,         "score" : 3     },     {         "_id" : ObjectId("4fda05d9322b1c95b531ac27"),         "id" : 3,         "name" : "Question 3",         "category_id" : 2,         "survey_id" : 1,         "score" : 4     },     {         "_id" : ObjectId("4fda4287322b1c95b531ac28"),         "id" : 4,         "name" : "Question 4",         "category_id" : 2,         "survey_id" : 1,         "score" : 7     } ]

I can find the category average with:

db.questions.aggregate(     { $group : {         _id : "$category_id",         avg_score : { $avg : "$score" }     } } );  {     "result" : [         {             "_id" : 1,             "avg_score" : 4         },         {             "_id" : 2,             "avg_score" : 5.5         }     ],     "ok" : 1 }

How can I get the average of category averages (note this is different than simply averaging all questions)? I would assume I would do multiple group operations but this fails:

> db.questions.aggregate( ...   { $group : { ...     _id : "$category_id", ...     avg_score : { $avg : "$score" }, ...   }}, ...   { $group : { ...     _id : "$survey_id", ...     avg_score : { $avg : "$score" }, ...   }} ... ); {     "errmsg" : "exception: the _id field for a group must not be undefined",     "code" : 15956,     "ok" : 0 } >

230

asked Jun 14 '12 20:06

Allyl Isocyanate

1 Answers

It's important to understand that the operations in the argument to aggregate() form a pipeline. This meant that the input to any element of the pipeline is the stream of documents produced by the previous element in the pipeline.

In your example, your first query creates a pipeline of documents that look like this:

{     "_id" : 2,     "avg_score" : 5.5 }, {     "_id" : 1,     "avg_score" : 4 }

This means that the second element of the pipline is seeing a series of documents where the only keys are "_id" and "avg_score". The keys "category_id" and "score" no longer exist in this document stream.

If you want to further aggregate on this stream, you'll have to aggregate using the keys that are seen at this stage in the pipeline. Since you want to average the averages, you need to put in a single constant value for the _id field, so that all of the input documents get grouped into a single result.

The following code produces the correct result:

db.questions.aggregate(     { $group : {         _id : "$category_id",         avg_score : { $avg : "$score" },         }     },     { $group : {         _id : "all",         avg_score : { $avg : "$avg_score" },         }     } );

When run, it produces the following output:

 {     "result" : [         {         "_id" : "all",         "avg_score" : 4.75         }     ],     "ok" : 1  }

answered Nov 02 '22 22:11

William Z

Related questions
                            
                                Meaning of $(OutDir)
                            
                                RabbitMQ-- selectively retrieving messages from a queue
                            
                                How to concatenate an absolute path and relative path with QDir?
                            
                                Blob object to base64 in JavaScript
                            
                                How is if-statement and bitwise operations same in this example?
                            
                                telling 'make' to ignore dependencies when the top target has been created
                            
                                How to set collation of a column with SQL?
                            
                                Catch unhandled exceptions from async
                            
                                AspNetSynchronizationContext
                            
                                What happens if I call a JS method with more parameters than it is defined to accept?
                            
                                Can Amazon EC2 reserved instances be used with auto-scaling in AWS Elastic Beanstalk?
                            
                                How to write a DLL file in C? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With