Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB aggregate/grouping by key-value pairs

My data looks something like this:

    { 
            "_id" : "9aa072e4-b706-47e6-9607-1a39e904a05a", 
            "customerId" : "2164289-4", 
            "channelStatuses" : {
                    "FOO" : {
                    "status" : "done"
                    }, 
                    "BAR" : {
                    "status" : "error"
                    }
            }, 
            "channel" : "BAR", 
    }

My aggregate/group looks like this:

    { 
            "_id" : {
                    "customerId" : "$customerId", 
                    "channel" : "$channel", 
                    "status" : "$channelStatuses[$channel].status"
            }, 
                    "count" : {
                    "$sum" : 1
            }
    }

So basically with the example data the group should give me a group grouped by:

   {"customerId": "2164289-4", "channel": "BAR", "status": "error"}

But I cannot use []-indexing in a aggregate/group. What should I do instead?

like image 222
Cihan Bebek Avatar asked Oct 31 '22 07:10

Cihan Bebek


1 Answers

You cannot get the result you want with the current structure using .aggregate(). You "could" change the structure to use an array rather than named keys, and the operation is actually quite simple.

So with a document like:

    { 
            "_id" : "9aa072e4-b706-47e6-9607-1a39e904a05a", 
            "customerId" : "2164289-4", 
            "channelStatuses" : [
                {
                    "channel": "FOO",
                    "status" : "done"
                }, 
                {
                    "channel": "BAR",
                    "status" : "error"
                }
            ], 
            "channel" : "BAR", 
    }

You can then do in modern releases with $filter, $map and $arrayElemAt:

    { "$group": {
        "_id": {
            "customerId" : "$customerId", 
            "channel" : "$channel", 
            "status": {
                "$arrayElemAt": [
                    { "$map": {
                        "input": { "$filter": {
                            "input": "$chanelStatuses",
                            "as": "el", 
                            "cond": { "$eq": [ "$$el.channel", "$channel" ] }
                        }},
                        "as": "el",
                        "in": "$$el.status"
                    }},
                    0
                ]
            }
        },
        "count": { "$sum": 1 }
    }}

Older versions of MongoDB are going to going to require $unwind to access the matched array element.

In MongoDB 2.6 then you can still "pre-filter" the array before unwind:

[
    { "$project": {
        "customerId": 1,
        "channel": 1,
        "status": {
            "$setDifference": [
                { "$map": {
                    "input": "$channelStatuses",
                    "as": "el",
                    "in": {
                        "$cond": [
                            { "$eq": [ "$$el.channel", "$channel" ] },
                            "$$el.status",
                            false
                        ]
                    }
                }},
                [false]
            ]
        }
    }},
    { "$unwind": "$status" },
    { "$group": {
        "_id": {
            "customerId": "$customerId",
            "channel": "$channel",
            "status": "$status"
        },
        "count": { "$sum": 1 }
    }}
]

And anything prior to that you "filter" after $unwind instead:

[
    { "$unwind": "$channelStatuses" },
    { "$project": {
        "customerId": 1,
        "channel": 1,
        "status": "$channelStatuses.status",
        "same": { "$eq": [ "$channelStatuses.status", "$channel" ] }
    }},
    { "$match": { "same": true } },
    { "$group": {
        "_id": "$_id",
        "customerId": { "$first": "$customerId" },
        "channel": { "$first": "$channel" },
        "status": { "$first": "$status" }
    }},
    { "$group": {
        "_id": {
            "customerId": "$customerId",
            "channel": "$channel",
            "status": "$status"
        },
        "count": { "$sum": 1 }
    }}
]

In a lesser version than MongoDB 2.6 you also need to $project the result of the equality test between the two fields and then $match on the result in a seperate stage. You might also note the "two" $group stages, since the first one removes any possible duplicates of the "channel" values after the filter via the $first accumulators. The following $group is exactly the same as in the previous listing.

But if you cannot change the structure and need "flexible" matching of keys where you cannot supply every name, then you must use mapReduce:

db.collection.mapReduce(
    function() {
       emit({
           "customerId": this.customerId,
           "channel": this.channel,
           "status": this.channelStatuses[this.channel].status
       },1);
    },
    function(key,values) {
        return Array.sum(values);
    },
    { "out": { "inline": 1 } }
)

Where of course you can use that sort of notation

like image 91
Blakes Seven Avatar answered Nov 08 '22 08:11

Blakes Seven