Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multiple group operations using Mongo aggregation framework

Tags:

Given a set of questions that have linked survey and category id:

> db.questions.find().toArray(); [     {         "_id" : ObjectId("4fda05bc322b1c95b531ac25"),         "id" : 1,         "name" : "Question 1",         "category_id" : 1,         "survey_id" : 1,         "score" : 5     },     {         "_id" : ObjectId("4fda05cb322b1c95b531ac26"),         "id" : 2,         "name" : "Question 2",         "category_id" : 1,         "survey_id" : 1,         "score" : 3     },     {         "_id" : ObjectId("4fda05d9322b1c95b531ac27"),         "id" : 3,         "name" : "Question 3",         "category_id" : 2,         "survey_id" : 1,         "score" : 4     },     {         "_id" : ObjectId("4fda4287322b1c95b531ac28"),         "id" : 4,         "name" : "Question 4",         "category_id" : 2,         "survey_id" : 1,         "score" : 7     } ] 

I can find the category average with:

db.questions.aggregate(     { $group : {         _id : "$category_id",         avg_score : { $avg : "$score" }     } } );  {     "result" : [         {             "_id" : 1,             "avg_score" : 4         },         {             "_id" : 2,             "avg_score" : 5.5         }     ],     "ok" : 1 } 

How can I get the average of category averages (note this is different than simply averaging all questions)? I would assume I would do multiple group operations but this fails:

> db.questions.aggregate( ...   { $group : { ...     _id : "$category_id", ...     avg_score : { $avg : "$score" }, ...   }}, ...   { $group : { ...     _id : "$survey_id", ...     avg_score : { $avg : "$score" }, ...   }} ... ); {     "errmsg" : "exception: the _id field for a group must not be undefined",     "code" : 15956,     "ok" : 0 } > 
like image 230
Allyl Isocyanate Avatar asked Jun 14 '12 20:06

Allyl Isocyanate


People also ask

Can we specify more than one aggregate function simultaneously in MongoDB?

Mongodb group by multiple fields using Aggregate operation First, the key on which the grouping is based is selected and then the collection is divided into groups according to the selected key value. You can then create a final document by aggregating the documents in each group.

WHAT IS group in MongoDB aggregation?

A group key is often a field, or group of fields. The group key can also be the result of an expression. Use the _id field in the $group pipeline stage to set the group key.

Which aggregation method is preferred for use by MongoDB?

The pipeline provides efficient data aggregation using native operations within MongoDB, and is the preferred method for data aggregation in MongoDB. The aggregation pipeline can operate on a sharded collection. The aggregation pipeline can use indexes to improve its performance during some of its stages.

What is the output of aggregation in MongoDB?

The output can be one or more documents. MongoDB offers a very powerful aggregation operation that can be divided into three categories: The $ group operator is an aggregator that returns a new document. It has its own operator, we can get the field of the current document by $ symbol + field name.

How to group values by multiple fields in MongoDB?

MongoDB group by multiple fields is used to group values by multiple fields using various methodologies. One of the most efficient ways of grouping the various fields present inside the documents of MongoDB is by using the $group operator, which helps in performing multiple other aggregation functions as well on the grouped data.

What is the $ group operator in MongoDB?

The $ group operator is an aggregator that returns a new document. It has its own operator, we can get the field of the current document by $ symbol + field name. To understand the MongoDB group by multiple fields first, let’s have a look at a list of all operators that can be used in $ group: $ sum – Returns the sum of all numeric fields.

Which query engine does MongoDB use for $group optimization?

Starting in version 5.2, MongoDB uses the slot-based execution query engine to execute $group stages if either: $group is the first stage in the pipeline. All preceding stages in the pipeline can also be executed by the slot-based engine. For more information, see $group Optimization.


1 Answers

It's important to understand that the operations in the argument to aggregate() form a pipeline. This meant that the input to any element of the pipeline is the stream of documents produced by the previous element in the pipeline.

In your example, your first query creates a pipeline of documents that look like this:

{     "_id" : 2,     "avg_score" : 5.5 }, {     "_id" : 1,     "avg_score" : 4 } 

This means that the second element of the pipline is seeing a series of documents where the only keys are "_id" and "avg_score". The keys "category_id" and "score" no longer exist in this document stream.

If you want to further aggregate on this stream, you'll have to aggregate using the keys that are seen at this stage in the pipeline. Since you want to average the averages, you need to put in a single constant value for the _id field, so that all of the input documents get grouped into a single result.

The following code produces the correct result:

db.questions.aggregate(     { $group : {         _id : "$category_id",         avg_score : { $avg : "$score" },         }     },     { $group : {         _id : "all",         avg_score : { $avg : "$avg_score" },         }     } ); 

When run, it produces the following output:

 {     "result" : [         {         "_id" : "all",         "avg_score" : 4.75         }     ],     "ok" : 1  } 
like image 69
William Z Avatar answered Nov 02 '22 22:11

William Z