Given a set of questions that have linked survey and category id:
> db.questions.find().toArray(); [ { "_id" : ObjectId("4fda05bc322b1c95b531ac25"), "id" : 1, "name" : "Question 1", "category_id" : 1, "survey_id" : 1, "score" : 5 }, { "_id" : ObjectId("4fda05cb322b1c95b531ac26"), "id" : 2, "name" : "Question 2", "category_id" : 1, "survey_id" : 1, "score" : 3 }, { "_id" : ObjectId("4fda05d9322b1c95b531ac27"), "id" : 3, "name" : "Question 3", "category_id" : 2, "survey_id" : 1, "score" : 4 }, { "_id" : ObjectId("4fda4287322b1c95b531ac28"), "id" : 4, "name" : "Question 4", "category_id" : 2, "survey_id" : 1, "score" : 7 } ]
I can find the category average with:
db.questions.aggregate( { $group : { _id : "$category_id", avg_score : { $avg : "$score" } } } ); { "result" : [ { "_id" : 1, "avg_score" : 4 }, { "_id" : 2, "avg_score" : 5.5 } ], "ok" : 1 }
How can I get the average of category averages (note this is different than simply averaging all questions)? I would assume I would do multiple group operations but this fails:
> db.questions.aggregate( ... { $group : { ... _id : "$category_id", ... avg_score : { $avg : "$score" }, ... }}, ... { $group : { ... _id : "$survey_id", ... avg_score : { $avg : "$score" }, ... }} ... ); { "errmsg" : "exception: the _id field for a group must not be undefined", "code" : 15956, "ok" : 0 } >
Mongodb group by multiple fields using Aggregate operation First, the key on which the grouping is based is selected and then the collection is divided into groups according to the selected key value. You can then create a final document by aggregating the documents in each group.
A group key is often a field, or group of fields. The group key can also be the result of an expression. Use the _id field in the $group pipeline stage to set the group key.
The pipeline provides efficient data aggregation using native operations within MongoDB, and is the preferred method for data aggregation in MongoDB. The aggregation pipeline can operate on a sharded collection. The aggregation pipeline can use indexes to improve its performance during some of its stages.
The output can be one or more documents. MongoDB offers a very powerful aggregation operation that can be divided into three categories: The $ group operator is an aggregator that returns a new document. It has its own operator, we can get the field of the current document by $ symbol + field name.
MongoDB group by multiple fields is used to group values by multiple fields using various methodologies. One of the most efficient ways of grouping the various fields present inside the documents of MongoDB is by using the $group operator, which helps in performing multiple other aggregation functions as well on the grouped data.
The $ group operator is an aggregator that returns a new document. It has its own operator, we can get the field of the current document by $ symbol + field name. To understand the MongoDB group by multiple fields first, let’s have a look at a list of all operators that can be used in $ group: $ sum – Returns the sum of all numeric fields.
Starting in version 5.2, MongoDB uses the slot-based execution query engine to execute $group stages if either: $group is the first stage in the pipeline. All preceding stages in the pipeline can also be executed by the slot-based engine. For more information, see $group Optimization.
It's important to understand that the operations in the argument to aggregate() form a pipeline. This meant that the input to any element of the pipeline is the stream of documents produced by the previous element in the pipeline.
In your example, your first query creates a pipeline of documents that look like this:
{ "_id" : 2, "avg_score" : 5.5 }, { "_id" : 1, "avg_score" : 4 }
This means that the second element of the pipline is seeing a series of documents where the only keys are "_id" and "avg_score". The keys "category_id" and "score" no longer exist in this document stream.
If you want to further aggregate on this stream, you'll have to aggregate using the keys that are seen at this stage in the pipeline. Since you want to average the averages, you need to put in a single constant value for the _id field, so that all of the input documents get grouped into a single result.
The following code produces the correct result:
db.questions.aggregate( { $group : { _id : "$category_id", avg_score : { $avg : "$score" }, } }, { $group : { _id : "all", avg_score : { $avg : "$avg_score" }, } } );
When run, it produces the following output:
{ "result" : [ { "_id" : "all", "avg_score" : 4.75 } ], "ok" : 1 }
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With