Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB aggregate with conditional sums

I have to do a special mongoDB query.

I have a collection with documents like these:

{
    "_id" : ObjectId("53c7fd86d624b06abc76e8f6"),
    "works" : [ 
        {
            "code" : "A001",
            "name" : "Cambiar bombilla",
            "orderId" : "53c7fd86d624b06abc76e8f6",
            "price" : 1400,
            "ID" : 1,
            "lazyLoaded" : true,
            "status" : 0,
            "Date" : ISODate("2014-07-21T10:31:55.063Z"),
            "TechnicianId" : "538efd918163b19307c59e8e",
            "_id" : ObjectId("53ccec1bf2bf4d5952b2f205")
        }, 
        {
            "code" : "A005",
            "name" : "Cambiar bombilla 5",
            "price" : 1050,
            "type" : "Bombillas",
            "TechnicianId" : "5383577a994be8b9a9e3f01e",
            "_id" : ObjectId("53ccfdbdf2bf4d5952b2f206")
        }, 
        {
            "code" : "A004",
            "name" : "Cambiar bombilla 4",
            "price" : 1010,
            "type" : "Bombillas",
            "TechnicianId" : "5383577a994be8b9a9e3f01e",
            "date" : "2014-07-21T11:50:52.702Z",
            "orderId" : "53c7fd86d624b06abc76e8f6",
            "_id" : ObjectId("53ccfe9c109c100000ad688a")
        }, 
        {
            "code" : "A002",
            "name" : "Cambiar bombilla 2",
            "price" : 1030,
            "type" : "Bombillas",
            "TechnicianId" : "5383577a994be8b9a9e3f01e",
            "date" : "2014-07-21T11:57:37.065Z",
            "orderId" : "53c7fd86d624b06abc76e8f6",
            "_id" : ObjectId("53cd0036109c100000ad688b")
        }, 
        {
            "code" : "A003",
            "name" : "Cambiar bombilla 3",
            "price" : 1050,
            "type" : "Bombillas",
            "TechnicianId" : "5383577a994be8b9a9e3f01e",
            "date" : "2014-07-21T11:59:35.586Z",
            "orderId" : "53c7fd86d624b06abc76e8f6",
            "_id" : ObjectId("53cd00a7109c100000ad688c")
        }
    ],
    "Items" : [ 
        {
            "_id" : "534ba71f394835a7e51dd938",
            "total":50
            "qty" : 2
        }, 
        {
            "_id" : "534664b081362062015d1b77",
            "qty" : 2,
            "total":30
        }
    ]}

Now, I want to do a query to get the works with TechnicianId=5383577a994be8b9a9e3f01e , the sum of the "price" for these works deducting the sum of items.total, the quantity of the total works for these documents deducting the sum of items.total.

For this example I want something like this:

{
    "result" : [ 
        {
            "_id" : ObjectId("53c7fd86d624b06abc76e8f6"),
            "works.totalsumfortech" : 1050+1010+1030+1050 - (50+30),//Sum of works of this technicianId deducting the items.total qty
            "works.total":1400+1050+1010+1030+1050-(50+30)//Summ of all works of document deducting the items.total qty
        }
    ],
    "ok" : 1
}

This is my current query but I don't get the expected result..

db.orders.aggregate([
        { "$match": {
            "$and": [
                {"OrderState": {$in:['Review','Archived']}},
                {'works.TechnicianId':'5383577a994be8b9a9e3f01e'}
            ]
        }},
        {$group: {
         _id: "$_id",
         'total': {$subtract:[{$sum: { $cond: [ { $eq: [ "$works.TechnicianId", "5383577a994be8b9a9e3f01e" ] } , 2, 1 ]},{$sum: '$Items.total'}]
         'total': {$subtract:[{$sum: '$works.price'},{$sum: '$Items.total'}]
         }
         }]);
like image 911
colymore Avatar asked Jul 23 '14 10:07

colymore


Video Answer


1 Answers

Modern

This actually is a lot easier with modern releases of MongoDB. In the particular case there is no actual "aggregation" across documents, though there is a significant reduction in data returned and "aggregation within the document" to be applied.

The modern take on this allows this sort of reshaping and selection from data in arrays without resorting to $unwind and $group in order to process:

db.getCollection('orders').aggregate([
  { "$match": {
    //"OrderState": { "$in":["Review","Archived"]},
    "works.TechnicianId":"5383577a994be8b9a9e3f01e"
  }},
  { "$project": {
    "works": {
      "$let": {
        "vars": {
          "techTotal": {
            "$sum": {
              "$map": {
                "input": { 
                  "$filter": {
                    "input": "$works",
                    "cond": {
                      "$eq": [
                        "$$this.TechnicianId",
                        "5383577a994be8b9a9e3f01e"
                      ]
                    }
                  }
                },
                "in": "$$this.price"
              }    
            }
          },
          "items_total": { "$sum": "$Items.total" },
          "worksTotal": { "$sum": "$works.price" }
        },
        "in": {
          "totalSumForTech": {
            "$subtract": [ "$$techTotal", "$$items_total" ]
          },
          "total": {
            "$subtract": [ "$$worksTotal", "$$items_total" ]
          }
        }
      }
    }
  }}
])

The changes since originally asked are that $sum accepts an "array" as input when used in a $project or similar stage context in addition to the traditional role as an accumulator. So rather than "unwinding" an array, you can do things like { "$sum": "$Items.total" } which retuns an array of values from the specified property by the inner notation and then "reduces" those values via $sum. That's a big improvement on it's own.

Additional improvements are $map and $filter. Where the latter is applied in order to return just the matching entries of an array to a given condition, and for former allows "reshaping" of array contents. Both are common methods in other programming languages for dealing with arrays and have basically the same function here.

This means you can extract the "price" values from the "works" array matching the technician as is required, and then total those using $sum as an "array of values" in the same way as described earlier.

The other addition is $let, which allows a block to be declared with "variables" for usage within that block. In this case we can compute those "totals" from the arrays and then apply $subtract to the calculated values in order to come to the final result.

The advantage over earlier versions is you can do this without separating aggregation pipeline stages. And of course the $$items_total can be used here in place of repeating the full statement to calculate. Also the general separation of calculations makes the final output block a bit easier to read. So it's really just "using variables" in much the same way you do in regular programming.

The big gain here is this becomes simply $match and $project, and not a whole chain of pipeline stages just to get to the final calculated result from each document.


Original

As stated earlier, you need to use $unwind when working with arrays in MongoDB aggregation. The aggregation operations do not work on each array element unless you do this.

Other problems here are that within the $group pipeline stage all of the "top level" operations need to be group aggregation operators. Things that are not like $subtract are not allowed, so you need to do those operations either within something like $sum where possible or in another pipeline stage:

db.orders.aggregate([

    // Match documents "and" is implied in MongoDB. Not required unless
    // against the same field
    { "$match": {
        "OrderState": { "$in":["Review","Archived"]},
        "works.TechnicianId":"5383577a994be8b9a9e3f01e"
    }},

    // Unwind Items first
    { "$unwind": "$Items" },

    // Group to get that total
    { "$group": {
        "_id": "$_id",
        "works": { "$first": "$works" },
        "items_total": { "$sum": "$Items.total" }
    }},

    // Unwind works to "de-normalize"
    { "$unwind": "$works" },

    // Group conditionally on "TechnicianId" and the full total
    { "$group": {
        "_id": "$_id",
        "techTotal": {
            "$sum": {
                "$cond": [ 
                    { "$eq": [ 
                        "$works.TechnicianId", 
                        "5383577a994be8b9a9e3f01e"
                    ]},
                    "$works.price",
                    0 
                ]
            }
        },
        "worksTotal": { "$sum": "$works.price" },
        "items_total": { "$first": "$items_total" }
    }},

    // Project to do math and other re-shaping
    { "$project": {
        "works": {
            "totalSumForTech": {
                "$subtract": [ "$techTotal", "$items_total" ]
            },
            "total": {
                "$subtract": [ "$worksTotal", "$items_total" ]
            }
        }
    }}
])

On the sample document (though I need to drop the $match since that data does not exist in your sample ) the results are:

{
    "_id" : ObjectId("53c7fd86d624b06abc76e8f6"),
    "works" : {
            "totalSumForTech" : 4060,
            "total" : 5460
    }
}
like image 159
Neil Lunn Avatar answered Sep 22 '22 19:09

Neil Lunn