Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mongodb group average array

Tags:

I'm trying to do PyMongo aggregate - $group averages of arrays, and I cannot find any examples that matches my problem.

Data example

{
    Subject: "Dave",
    Strength: [1,2,3,4]
},
{
    Subject: "Dave",
    Strength: [1,2,3,5]
},
{
    Subject: "Dave",
    Strength: [1,2,3,6]
},
{
    Subject: "Stuart",
    Strength: [4,5,6,7]
},
{
    Subject: "Stuart",
    Strength: [6,5,6,7]
},
{
    Subject: "Kevin",
    Strength: [1,2,3,4]
},
{
    Subject: "Kevin",
    Strength: [9,4,3,4]
}

Wanted results

{
    Subject: "Dave",
    mean_strength = [1,2,3,5]
},
{
    Subject: "Stuart",
    mean_strength = [5,5,6,7]
},
{
    Subject: "Kevin",
    mean_strength = [5,3,3,4]
}

I have tried this approach but MongoDB is interpreting the arrays as Null?

pipe = [{'$group': {'_id': 'Subject', 'mean_strength': {'$avg': '$Strength'}}}]
results = db.Walk.aggregate(pipeline=pipe)

Out: [{'_id': 'SubjectID', 'total': None}]

I've looked through the MongoDB documentation and I cannot find or understand if there is any way to do this?

like image 626
Kasper Rasmussen Avatar asked Dec 18 '17 13:12

Kasper Rasmussen


1 Answers

You could use $unwind with includeArrayIndex. As the name suggests, includeArrayIndex adds the array index to the output. This allows for grouping by Subject and array position in Strength. After calculating the average, the results need to be sorted to ensure the second $group and $push add the results back into the right order. Finally there is a $project to include and rename the relevant columns.

db.test.aggregate([{
        "$unwind": {
            "path": "$Strength",
            "includeArrayIndex": "rownum"
        }
    },
    {
        "$group": {
            "_id": {
                "Subject": "$Subject",
                "rownum": "$rownum"
            },
            "mean_strength": {
                "$avg": "$Strength"
            }
        }
    },
    {
        "$sort": {
            "_id.Subject": 1,
            "_id.rownum": 1
        }
    },
    {
        "$group": {
            "_id": "$_id.Subject",
            "mean_strength": {
                "$push": "$mean_strength"
            }
        }
    },
    {
        "$project": {
            "_id": 0,
            "Subject": "$_id",
            "mean_strength": 1
        }
    }
])

For your test input, this returns:

{ "mean_strength" : [ 5, 5, 6, 7 ], "Subject" : "Stuart" }
{ "mean_strength" : [ 5, 3, 3, 4 ], "Subject" : "Kevin" }
{ "mean_strength" : [ 1, 2, 3, 5 ], "Subject" : "Dave" }
like image 103
Alex Avatar answered Sep 23 '22 13:09

Alex