Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate the average of fields in embedded documents/array

I want to calculate the rating_average field of this object with the rating fields inside the array ratings. Can you help me to understand how to use aggregation with $avg?

{
    "title": "The Hobbit",
    "rating_average": "???",
    "ratings": [
        {
            "title": "best book ever",
            "rating": 5
        },
        {
            "title": "good book",
            "rating": 3.5
        }
    ]
}
like image 407
retrobitguy Avatar asked May 17 '15 16:05

retrobitguy


3 Answers

The aggregation framework in MongoDB 3.4 and newer offers the $reduce operator which efficiently calculates the total without the need for extra pipelines. Consider using it as an expression to return the total ratings and get the number of ratings using $size. Together with $addFields, the average can thus be calculated using the arithmetic operator $divide as in the formula average = total ratings/number of ratings:

db.collection.aggregate([
    { 
        "$addFields": { 
            "rating_average": {
                "$divide": [
                    { // expression returns total
                        "$reduce": {
                            "input": "$ratings",
                            "initialValue": 0,
                            "in": { "$add": ["$$value", "$$this.rating"] }
                        }
                    },
                    { // expression returns ratings count
                        "$cond": [
                            { "$ne": [ { "$size": "$ratings" }, 0 ] },
                            { "$size": "$ratings" }, 
                            1
                        ]
                    }
                ]
            }
        }
    }           
])

Sample Output

{
    "_id" : ObjectId("58ab48556da32ab5198623f4"),
    "title" : "The Hobbit",
    "ratings" : [ 
        {
            "title" : "best book ever",
            "rating" : 5.0
        }, 
        {
            "title" : "good book",
            "rating" : 3.5
        }
    ],
    "rating_average" : 4.25
}

With older versions, you would need to first apply the $unwind operator on the ratings array field first as your initial aggregation pipeline step. This will deconstruct the ratings array field from the input documents to output a document for each element. Each output document replaces the array with an element value.

The second pipeline stage would be the $group operator which groups input documents by the _id and title keys identifier expression and applies the desired $avg accumulator expression to each group that calculates the average. There is another accumulator operator $push that preserves the original ratings array field by returning an array of all values that result from applying an expression to each document in the above group.

The final pipeline step is the $project operator which then reshapes each document in the stream, such as by adding the new field ratings_average.

So, if for instance you have a sample document in your collection (as from above and so below):

db.collection.insert({
    "title": "The Hobbit",

    "ratings": [
        {
            "title": "best book ever",
            "rating": 5
        },
        {
            "title": "good book",
            "rating": 3.5
        }
    ]
})

To calculate the ratings array average and projecting the value in another field ratings_average, you can then apply the following aggregation pipeline:

db.collection.aggregate([
    {
        "$unwind": "$ratings"
    },
    {
        "$group": {
            "_id": {
                "_id": "$_id",
                "title": "$title"
            },
            "ratings":{
                "$push": "$ratings"
            },
            "ratings_average": {
                "$avg": "$ratings.rating"
            }
        }
    },
    {
        "$project": {
            "_id": 0,
            "title": "$_id.title",
            "ratings_average": 1,
            "ratings": 1
        }
    }
])

Result:

/* 1 */
{
    "result" : [ 
        {
            "ratings" : [ 
                {
                    "title" : "best book ever",
                    "rating" : 5
                }, 
                {
                    "title" : "good book",
                    "rating" : 3.5
                }
            ],
            "ratings_average" : 4.25,
            "title" : "The Hobbit"
        }
    ],
    "ok" : 1
}
like image 67
chridam Avatar answered Nov 13 '22 07:11

chridam


This really could be written so much shorter, and this was even true at the time of writing. If you want an "average" simply use $avg:

db.collection.aggregate([
  { "$addFields": {
    "rating_average": { "$avg": "$ratings.rating" }
  }}
])

The reason for this is that as of MongoDB 3.2 the $avg operator gained "two" things:

  1. The ability to process an "array" of arguments in a "expression" form rather than solely as an accumulator to $group

  2. Benefits from the features of MongoDB 3.2 that allowed the "shorthand" notation of array expressions. Being either in composition:

    { "array": [ "$fielda", "$fieldb" ] }
    

    or in notating a single property from the array as an array of the values of that property:

    { "$avg": "$ratings.rating" } // equal to { "$avg": [ 5, 3.5 ] }
    

In earlier releases you would have to use $map in order to access the "rating" property inside each array element. Now you don't.


For the record, even the $reduce usage can be simplified:

db.collection.aggregate([
  { "$addFields": {
    "rating_average": {
      "$reduce": {
        "input": "$ratings",
        "initialValue": 0,
        "in": {
          "$add": [ 
            "$$value",
            { "$divide": [ 
              "$$this.rating", 
              { "$size": { "$ifNull": [ "$ratings", [] ] } }
            ]}
          ]
        }
      }
    }
  }}
])

Yes as stated, this is really just re-implementing the existing $avg functionality, and therefore since that operator is available then it is the one that should be used.

like image 40
Neil Lunn Avatar answered Nov 13 '22 09:11

Neil Lunn


As you have your to-be-calculated-average data in an array, first you need to unwind it. Do it by using the $unwind in your aggregation pipeline:

{$unwind: "$ratings"}

Then you may access each element of the array as an embedded document with key ratings in the result documents of the aggregation. Then you just need to $group by title and calculate $avg:

{$group: {_id: "$title", ratings: {$push: "$ratings"}, average: {$avg: "$ratings.rating"}}}

Then just recover your title field:

{$project: {_id: 0, title: "$_id", ratings: 1, average: 1}}

So here is your result aggregation pipeline:

db.yourCollection.aggregate([
                               {$unwind: "$ratings"}, 
                               {$group: {_id: "$title", 
                                         ratings: {$push: "$ratings"}, 
                                         average: {$avg: "$ratings.rating"}
                                        }
                               },
                               {$project: {_id: 0, title: "$_id", ratings: 1, average: 1}}
                            ])
like image 44
bagrat Avatar answered Nov 13 '22 08:11

bagrat