Mongoose aggregation "$sum" of rows in sub document

Tags:

I'm fairly good with sql queries, but I can't seem to get my head around grouping and getting sum of mongo db documents,

With this in mind, I have a job model with schema like below :

    {
        name: {
            type: String,
            required: true
        },
        info: String,
        active: {
            type: Boolean,
            default: true
        },
        all_service: [

            price: {
                type: Number,
                min: 0,
                required: true
            },
            all_sub_item: [{
                name: String,
                price:{ // << -- this is the price I want to calculate
                    type: Number,
                    min: 0
                },
                owner: {
                    user_id: {  //  <<-- here is the filter I want to put
                        type: Schema.Types.ObjectId,
                        required: true
                    },
                    name: String,
                    ...
                }
            }]

        ],
        date_create: {
            type: Date,
            default : Date.now
        },
        date_update: {
            type: Date,
            default : Date.now
        }
    }

I would like to have a sum of price column, where owner is present, I tried below but no luck

 Job.aggregate(
        [
            {
                $group: {
                    _id: {}, // not sure what to put here
                    amount: { $sum: '$all_service.all_sub_item.price' }
                },
                $match: {'not sure how to limit the user': given_user_id}
            }
        ],
        //{ $project: { _id: 1, expense: 1 }}, // you can only project fields from 'group'
        function(err, summary) {
            console.log(err);
            console.log(summary);
        }
    );

Could someone guide me in the right direction. thank you in advance

823

asked Jul 15 '15 17:07

Developerium

1 Answers

Primer

As is correctly noted earlier, it does help to think of an aggregation "pipeline" just as the "pipe" | operator from Unix and other system shells. One "stage" feeds input to the "next" stage and so on.

The thing you need to be careful with here is that you have "nested" arrays, one array within another, and this can make drastic differences to your expected results if you are not careful.

Your documents consist of an "all_service" array at the top level. Presumably there are often "multiple" entries here, all containing your "price" property as well as "all_sub_item". Then of course "all_sub_item" is an array in itself, also containg many items of it's own.

You can think of these arrays as the "relations" between your tables in SQL, in each case a "one-to-many". But the data is in a "pre-joined" form, where you can fetch all data at once without performing joins. That much you should already be familiar with.

However, when you want to "aggregate" accross documents, you need to "de-normalize" this in much the same way as in SQL by "defining" the "joins". This is to "transform" the data into a de-normalized state that is suitable for aggregation.

So the same visualization applies. A master document's entries are replicated by the number of child documents, and a "join" to an "inner-child" will replicate both the master and initial "child" accordingly. In a "nutshell", this:

{
    "a": 1,
    "b": [
        { 
            "c": 1,
            "d": [
                { "e": 1 }, { "e": 2 }
            ]
        },
        { 
            "c": 2,
            "d": [
                { "e": 1 }, { "e": 2 }
            ]
        }
    ]
}

Becomes this:

{ "a" : 1, "b" : { "c" : 1, "d" : { "e" : 1 } } }
{ "a" : 1, "b" : { "c" : 1, "d" : { "e" : 2 } } }
{ "a" : 1, "b" : { "c" : 2, "d" : { "e" : 1 } } }
{ "a" : 1, "b" : { "c" : 2, "d" : { "e" : 2 } } }

And the operation to do this is $unwind, and since there are multiple arrays then you need to $unwind both of them before continuing any processing:

db.collection.aggregate([
    { "$unwind": "$b" },
    { "$unwind": "$b.d" }
])

So there the "pipe" first array from "$b" like so:

{ "a" : 1, "b" : { "c" : 1, "d" : [ { "e" : 1 }, { "e" : 2 } ] } }
{ "a" : 1, "b" : { "c" : 2, "d" : [ { "e" : 1 }, { "e" : 2 } ] } }

Which leaves a second array referenced by "$b.d" to further be de-normalized into the the final de-normalized result "without any arrays". This allows other operations to process.

Solving

With just about "every" aggregation pipeline, the "first" thing you want to do is "filter" the documents to only those that contain your results. This is a good idea, as especially when doing operations such as $unwind, then you don't want to be doing that on documents that do not even match your target data.

So you need to match your "user_id" at the array depth. But this is only part of getting the result, since you should be aware of what happens when you query a document for a matching value in an array.

Of course, the "whole" document is still returned, because this is what you really asked for. The data is already "joined" and we haven't asked to "un-join" it in any way.You look at this just as a "first" document selection does, but then when "de-normalized", every array element now actualy represents a "document" in itself.

So not "only" do you $match at the beginning of the "pipeline", you also $match after you have processed "all" $unwind statements, down to the level of the element you wish to match.

Job.aggregate(
    [
        // Match to filter possible "documents"
        { "$match": { 
            "all_service.all_sub_item.owner": given_user_id
        }},

        // De-normalize arrays
        { "$unwind": "$all_service" },
        { "$unwind": "$all_service.all_subitem" },

        // Match again to filter the array elements
        { "$match": { 
            "all_service.all_sub_item.owner": given_user_id
        }},

        // Group on the "_id" for the "key" you want, or "null" for all
        { "$group": {
            "_id": null,
            "total": { "$sum": "$all_service.all_sub_item.price" }
        }}

    ],
    function(err,results) {

    }
)

Alternately, modern MongoDB releases since 2.6 also support the $redact operator. This could be used in this case to "pre-filter" the array content before processing with $unwind:

Job.aggregate(
    [
        // Match to filter possible "documents"
        { "$match": { 
            "all_service.all_sub_item.owner": given_user_id
        }},

        // Filter arrays for matches in document
        { "$redact": {
            "$cond": {
                "if": { 
                    "$eq": [ 
                        { "$ifNull": [ "$owner", given_user_id ] },
                        given_user_id
                    ]
                },
                "then": "$$DESCEND",
                "else": "$$PRUNE"
            }
        }},

        // De-normalize arrays
        { "$unwind": "$all_service" },
        { "$unwind": "$all_service.all_subitem" },

        // Group on the "_id" for the "key" you want, or "null" for all
        { "$group": {
            "_id": null,
            "total": { "$sum": "$all_service.all_sub_item.price" }
        }}

    ],
    function(err,results) {

    }
)

That can "recursively" traverse the document and test for the condition, effectively removing any "un-matched" array elements before you even $unwind. This can speed things up a bit since items that do not match would not need to be "un-wound". However there is a "catch" in that if for some reason the "owner" did not exist on an array element at all, then the logic required here would count that as another "match". You can always $match again to be sure, but there is still a more efficient way to do this:

Job.aggregate(
    [
        // Match to filter possible "documents"
        { "$match": { 
            "all_service.all_sub_item.owner": given_user_id
        }},

        // Filter arrays for matches in document
        { "$project": {
            "all_items": {
              "$setDifference": [
                { "$map": {
                  "input": "$all_service",
                  "as": "A",
                  "in": {
                    "$setDifference": [
                      { "$map": {
                        "input": "$$A.all_sub_item",
                        "as": "B",
                        "in": {
                          "$cond": {
                            "if": { "$eq": [ "$$B.owner", given_user_id ] },
                            "then": "$$B",
                            "else": false
                          }
                        }
                      }},
                      false
                    ]          
                  }
                }},
                [[]]
              ]
            }
        }},


        // De-normalize the "two" level array. "Double" $unwind
        { "$unwind": "$all_items" },
        { "$unwind": "$all_items" },

        // Group on the "_id" for the "key" you want, or "null" for all
        { "$group": {
            "_id": null,
            "total": { "$sum": "$all_items.price" }
        }}

    ],
    function(err,results) {

    }
)

That process cuts down the size of the items in both arrays "drastically" compared to $redact. The $map operator processes each elment of an array to the given statement within "in". In this case, each "outer" array elment is sent to another $map to process the "inner" elements.

A logical test is performed here with $cond whereby if the "condiition" is met then the "inner" array elment is returned, otherwise the false value is returned.

The $setDifference is used to filter down any false values that are returned. Or as in the "outer" case, any "blank" arrays resulting from all false values being filtered from the "inner" where there is no match there. This leaves just the matching items, encased in a "double" array, e.g:

[[{ "_id": 1, "price": 1, "owner": "b" },{..}],[{..},{..}]]

As "all" array elements have an _id by default with mongoose (and this is a good reason why you keep that) then every item is "distinct" and not affected by the "set" operator, apart from removing the un-matched values.

Process $unwind "twice" to convert these into plain objects in their own documents, suitable for aggregation.

So those are the things you need to know. As I stated earlier, be "aware" of how the data "de-normalizes" and what that implies towards your end totals.

177

answered Oct 06 '22 18:10

Blakes Seven

Related questions
                            
                                How to have heavy processing operations done in node.js
                            
                                Mongoose date format
                            
                                Should I use return keyword when rendering a view?
                            
                                Lots of parallel http requests in node.js
                            
                                Node.js - TypeError: Cannot read property 'readPreference' of undefined
                            
                                node which port / ip address to listen to for azure ubuntu vm
                            
                                Storing JSON data from Node.js to MongoDB
                            
                                Nodeunit: Runtime/thrown errors in test function are _silent_
                            
                                CSSLint : How to config tasks just print error not warning
                            
                                How do I use the same MySQL connection(s) for my entire Node.js app?
                            
                                Dynamic nested ul\li list from json data using Javascript
                            
                                Can someone explain what util.inherits does in more laymans terms?
                            
                                connect: multipart: use parser (multiparty, busboy, formidable)
                            
                                Webapp with MEAN stack and Java
                            
                                How to export env variable in node.js
                            
                                How do I access Passport's req.user variable in client side javascript?
                            
                                Return variable from node.js to sh script
                            
                                Putting data back onto a Readable stream
                            
                                Serving Dynamic Webpages with Node.js
                            
                                Run Grunt on ElasticBeanstalk

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Mongoose aggregation "$sum" of rows in sub document

Tags:

node.js

mongodb

mongodb-query

mongoose

aggregation-framework