Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get the maximum value of a field for each group with the array of corresponding documents?

I have a collection like

{
    "_id" : ObjectId("5738cb363bb56eb8f76c2ba8"),
    "records" : [
        {
            "Name" : "Joe",
            "Salary" : 70000,
            "Department" : "IT"
        }
    ]
},
{
    "_id" : ObjectId("5738cb363bb56eb8f76c2ba9"),
    "records" : [
        {
            "Name" : "Henry",
            "Salary" : 80000,
            "Department" : "Sales"
        },
        {
            "Name" : "Jake",
            "Salary" : 40000,
            "Department" : "Sales"
        }
    ]
},
{
    "_id" : ObjectId("5738cb363bb56eb8f76c2baa"),
    "records" : [
        {
            "Name" : "Sam",
            "Salary" : 90000,
            "Department" : "IT"
        },
        {
            "Name" : "Tom",
            "Salary" : 50000,
            "Department" : "Sales"
        }
    ]
}

I want to have the results with the highest salary by each department

{"Name": "Sam", "Salary": 90000, "Department": "IT"}
{"Name": "Henry", "Salary": 80000, "Department": "Sales"}

I could get the highest salary. But I could not get the corresponding employee names.

db.HR.aggregate([

    { "$unwind": "$records" },
    { "$group": 
        {
            "_id": "$records.Department",
            "max_salary": { "$max": "$records.Salary" }
        }
    }   
])

Could somebody help me?

like image 615
dapangmao Avatar asked Nov 30 '15 21:11

dapangmao


3 Answers

You need to $sort your document after $unwind and use the $first operator in the $group stage. You can also use the $last operator in which case you will need to sort your documents in ascending order

db.HR.aggregate([
    { '$unwind': '$records' }, 
    { '$sort': { 'records.Salary': -1 } }, 
    { '$group': { 
        '_id': '$records.Department', 
        'Name': { '$first': '$records.Name' } , 
        'Salary': { '$first': '$records.Salary' }
    }}
])

which produces:

{ "_id" : "Sales", "Name" : "Henry", "Salary" : 80000 }
{ "_id" : "IT", "Name" : "Sam", "Salary" : 90000 }

To return the maximum salary and employees list for each department you need to use the $max in your group stage to return the maximum "Salary" for each group then use $push accumulator operator to return a list of "Name" and "Salary" for all employees for each group. From there you need to use the $map operator in your $project stage to return a list of names alongside the maximum salary. Of course the $cond here is used to compare each employee salary to the maximum value. The $setDifference does his work which is filter out all false and is fine as long as the data being filtered is "unique". In this case it "should" be fine, but if any two results contained the same "name" then it would skew results by considering the two to be one.

db.HR.aggregate([
    { '$unwind': '$records' },
    { '$group': {
        '_id': '$records.Department', 
        'maxSalary': { '$max': '$records.Salary' }, 
        'persons': { 
            '$push': {
                'Name': '$records.Name', 
                'Salary': '$records.Salary' 
            }
        }
    }}, 
    { '$project': { 
        'maxSalary': 1, 
        'persons': { 
            '$setDifference': [
                { '$map': {
                    'input': '$persons', 
                    'as': 'person', 
                    'in': {
                        '$cond': [
                            { '$eq': [ '$$person.Salary', '$maxSalary' ] }, 
                            '$$person.Name', 
                            false
                        ]
                    }
                }}, 
                [false]
            ]
        }
    }}
])

which yields:

{ "_id" : "Sales", "maxSalary" : 80000, "persons" : [ "Henry" ] }
{ "_id" : "IT", "maxSalary" : 90000, "persons" : [ "Sam" ] }
like image 123
styvane Avatar answered Oct 23 '22 16:10

styvane


Its not the most intuitive thing, but instead of $max you should be using $sort and $first:

{ "$unwind": "$records" },
{ "$sort": { "$records.Salary": -1},
{ "$group" : 
    {
        "_id": "$records.Department",
        "max_salary": { "$first": "$records.Salary" },
        "name": {$first: "$records.Name"}
    }
}

Alternatively, I think this is doable using the $$ROOT operator (fair warning: I've not actually tried this) -

{ "$unwind": "$records" },
{ "$group": 
        {
            "_id": "$records.Department",
            "max_salary": { "$max": "$records.Salary" }
            "name" : "$$ROOT.records.Name"
        }
    }
}
like image 35
David says Reinstate Monica Avatar answered Oct 23 '22 15:10

David says Reinstate Monica


Another possible solution:

db.HR.aggregate([
    {"$unwind": "$records"},
    {"$group":{
        "_id": "$records.Department",
        "arr": {"$push": {"Name":"$records.Name", "Salary":"$records.Salary"}},
        "maxSalary": {"$max":"$records.Salary"}
    }},
    {"$unwind": "$arr"},
    {"$project": {
        "_id":1,
        "arr":1,
        "isMax":{"$eq":["$arr.Salary", "$maxSalary"]}
    }},
    {"$match":{
        "isMax":true
    }}
])

This solution takes advantage of the $eq operator to compare two fields in the $project stage.

Test case:

db.HR.insert({"records": [{"Name": "Joe", "Salary": 70000, "Department": "IT"}]})
db.HR.insert({"records": [{"Name": "Henry", "Salary": 80000, "Department": "Sales"}, {"Name": "Jake", "Salary": 40000, "Department": "Sales"}, {"Name": "Santa", "Salary": 90000, "Department": "IT"}]})
db.HR.insert({"records": [{"Name": "Sam", "Salary": 90000, "Department": "IT"}, {"Name": "Tom", "Salary": 50000, "Department": "Sales"}]})

Result:

{ "_id" : "Sales", "arr" : { "Name" : "Henry", "Salary" : 80000 }, "isMax" : true }
{ "_id" : "IT", "arr" : { "Name" : "Santa", "Salary" : 90000 }, "isMax" : true }
{ "_id" : "IT", "arr" : { "Name" : "Sam", "Salary" : 90000 }, "isMax" : true }
like image 1
joao Avatar answered Oct 23 '22 14:10

joao