Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combining group and project in mongoDB aggregation framework

my document looks like this:

{ 
  "_id" : ObjectId("5748d1e2498ea908d588b65e"), 
  "some_item" : {
    "_id" : ObjectId("5693afb1b49eb7d5ed97de14"), 
    "item_property_1" : 1.0, 
    "item_property_2" : 2.0,
  }, 
  "timestamp" : "2016-05-28",  
  "price_information" : {
    "arbitrary_value" : 111, 
    "hourly_rates" : [
        {
            "price" : 74.45, 
            "hour" : "0"
        }, 
        {
            "price" : 74.45, 
            "hour" : "1"
        }, 
        {
            "price" : 74.45, 
            "hour" : "2"
        }, 
    ]
  }
}

I did average the price per day via:

db.hourly.aggregate([
  {$match: {timestamp : "2016-05-28"}},
  {$unwind: "$price_information.hourly_rates"},
  {$group: { _id: "$unique_item_identifier", total_price: { $avg: "$price_information.hourly_rates.price"}}}
]);

I am struggling with bringing (projecting) other params with in the result set. I would like to have also some_item and timestampin the result set. I tried to use a $project: {some_item: 1, total_price: 1, ...} within the query, but that wasn't right.

My desired output would be like:

{ 
  "_id" : ObjectId("5693afb1b49eb7d5ed97de27"), 
  "someItem" : {
    "_id" : ObjectId("5693afb1b49eb7d5ed97de14"), 
    "item_property_1" : 1.0, 
    "item_property_2" : 2.0,
  }, 
  "timestamp" : "2016-05-28",  
  "price_information" : {
    "avg_price": 34
  }
}

If somebody could give me a hint, how to project the grouping and the other params into the result set, I would be thankful.

Best Rob

like image 619
Robert Kirsten Avatar asked May 31 '16 14:05

Robert Kirsten


1 Answers

If using MongoDB 3.2 and newer, you can use $avg in the $project pipeline since it returns the average of the specified expression or list of expressions for each document e.g

db.hourly.aggregate([
    { "$match": { "timestamp": "2016-05-28" } },
    {
        "$project": { 
            "price_information": { 
                "avg_price": { "$avg": "$price_information.hourly_rates.price" }
            },
            "someItem": 1,
            "timestamp": 1,
        }
    }
]);

In previous versions of MongoDB, $avg is available in the $group stage only. So to include the other fields, use the $first operator in your grouping:

db.hourly.aggregate([
    { "$match": { "timestamp": "2016-05-28" } },
    { "$unwind": "$price_information.hourly_rates" },
    {
        "$group": { 
            "_id": "$_id", 
            "avg_price": { "$avg": "$price_information.hourly_rates.price" },
            "someItem": { "$first": "$some_item" },
            "timestamp": { "$first": "$timestamp" },
        }
    },
    {
        "$project": {
            "price_information": { "avg_price": "$avg_price" },
            "someItem": 1
            "timestamp": 1
        }
    }
]);

Note: Usage of the $first operator in a $group stage will largely depend on how the documents getting in that pipeline are ordered as well as the group by key. Because $first will returns the first document value in a group of documents that share the same group by key, the $group stage logically should precede a $sort stage to have the input documents in a defined order. This is only sensible to use when you know the order that the data is being processed in.

However, as the above is grouping by the main document's _id key, the $first operator when applied to non-denormalized fields (and not the flattened price_information array fields) will guarantee the original value in the result. Hence no need for a pre-sort stage to define the order since it won't be necessary in this case.

like image 138
chridam Avatar answered Oct 21 '22 22:10

chridam