Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Implement $bucket to group by multiple fields

At first bucket by age and boundaries is [0,20,30,40,50,200]

db.user.aggregate(
    {$project: {_id:0, age:{$subtract:[{$year:new Date()}, {$year:"$birthDay"}]} } },
    {$bucket:{
        groupBy:"$age",
        boundaries:[0,20,30,40,50,200]
    }},
    { $project:{ _id:0,age:"$_id",count:1 } }
)

got below result

{ "count" : 5, "age" : 20 }
{ "count" : 1, "age" : 30 }

then further I want to stat every age range count of each city

{ city : "SH", age: 20, count: 2 }
{ city : "BJ", age: 20, count: 3 }
{ city : "BJ", age: 30, count: 1 }

So in this case how to implement it ?

In addition

db.user.aggregate(
    { $project: {_id:0, city:1, age:{$subtract:[{$year:new Date()}, {$year:"$birthDay"}]} } },
    { $group: { _id:"$city",ages:{$push:"$age"} } },
    { $project: {_id:0, city:"$_id",ages:1} }
)

{ "city" : "SH", "ages" : [ 26, 26 ] }
{ "city" : "BJ", "ages" : [ 27, 26, 26, 36 ] }
like image 698
zhuguowei Avatar asked Dec 24 '22 17:12

zhuguowei


1 Answers

What you are talking about is actually implemented with $switch, within a regular $group stage:

db.user.aggregate([
  { "$group": {
    "_id": {
      "city": "$city",
      "age": {
        "$let": {
          "vars": { 
            "age": { "$subtract" :[{ "$year": new Date() },{ "$year": "$birthDay" }] }
          },
          "in": {
            "$switch": {
              "branches": [
                { "case": { "$lt": [ "$$age", 20 ] }, "then": 0 },
                { "case": { "$lt": [ "$$age", 30 ] }, "then": 20 },
                { "case": { "$lt": [ "$$age", 40 ] }, "then": 30 },
                { "case": { "$lt": [ "$$age", 50 ] }, "then": 40 },
                { "case": { "$lt": [ "$$age", 200 ] }, "then": 50 }
              ]
            }
          }
        }
      }
    },
    "count": { "$sum": 1 }
  }}
])

With the results:

{ "_id" : { "city" : "BJ", "age" : 30 }, "count" : 1 }
{ "_id" : { "city" : "BJ", "age" : 20 }, "count" : 3 }
{ "_id" : { "city" : "SH", "age" : 20 }, "count" : 2 }

The $bucket pipeline stage only takes a single field path. You can have multiple accumulators via the "output" option, but the "groupBy" is a single expression.

Note you can also use $let here in preference to a separate $project pipeline stage to calculate the "age".

N.B If you actually throw some erroneous expressions to $bucket you will get errors about $switch, which should hint to you that this is how it is implemented internally.


If you are worried about coding in the $switch then just generate it:

var ranges = [0,20,30,40,50,200];
var branches = [];
for ( var i=1; i < ranges.length; i++) {
  branches.push({ "case": { "$lt": [ "$$age", ranges[i] ] }, "then": ranges[i-1] });
}

db.user.aggregate([
  { "$group": {
    "_id": {
      "city": "$city",
      "age": {
        "$let": {
          "vars": {
            "age": { 
              "$subtract": [{ "$year": new Date() },{ "$year": "$birthDay" }]
            }
          },
          "in": {
            "$switch": { "branches": branches }
          }
        }
      }
    },
    "count": { "$sum": 1 }
  }}
])
like image 85
Neil Lunn Avatar answered Jan 10 '23 21:01

Neil Lunn