Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB 2.1 Aggregate Framework Sum of Array Elements matching a name

Tags:

mongodb

This is a question about the best way to add up a series of data in an array where I have to match another element. I'm trying to use the 2.2 Aggregation framework and it's possible I can do this with a simple group.

So for a given set of documents I'm trying to get an output like this;

{
    "result" : [
            {
                "_id" : null,
                "numberOf": 2,
                "Sales" : 468000,
                "profit" : 246246,
            }
    ],
    "ok" : 1
}

Now, I originally had a list of documents, containing values assigned to named properties, eg;

[
{
    _id : 1,
    finance: {
        sales: 234000,
        profit: 123123,
    }
}
,
{
    _id : 2,
    finance: {
        sales: 234000,
        profit: 123123,
    }
}
]

This was easy enough to add up, but the structure didn't work for other reasons. For instance, there are may other columns like "finance" and I want to be able to index them without creating thousands of indexes, so I need to convert to a structure like this;

[
{
    _id : 1,
    finance: [
        {
            "k": "sales",
            "v": {
                "description":"sales over the year",
                v: 234000,
            }
        },
        {
            "k": "profit",
            "v": {
                "description":"money made from sales",
                v: 123123,
            }
        }
    ]
}
,
{
    _id : 2,
    finance: [
        {
            "k": "sales",
            "v": {
                "description":"sales over the year",
                v: 234000,
            }
        },
        {
            "k": "profit",
            "v": {
                "description": "money made from sales",
                v: 123123,
            }
        }
    ]
}
]

I can index finance.k if I want, but then I'm struggling to build an aggregate query to add up all the numbers matching a particular key. This was the reason I originally went for named properties, but this really needs to work in a situation whereby there are thousands of "k" labels.

Does anyone know how to build an aggregate query for this using the new framework? I've tried this;

db.projects.aggregate([
    {
        $match: {
            // QUERY
            $and: [
                // main query
                {},
            ]
        }
    },
    {
        $group: {
            _id: null,
            "numberOf": { $sum: 1 },
            "sales":    { $sum: "$finance.v.v" },
            "profit":   { $sum: "$finance.v.v" },
        }
    },
])

but I get;

{
    "errmsg" : "exception: can't convert from BSON type Array to double",
    "code" : 16005,
    "ok" : 0
}

** For extra kudos, I'll need to be able to do this in a MapReduce query as well.

like image 758
cirrus Avatar asked Aug 28 '12 15:08

cirrus


People also ask

How do I sum fields in MongoDB?

If used on a field that contains both numeric and non-numeric values, $sum ignores the non-numeric values and returns the sum of the numeric values. If used on a field that does not exist in any document in the collection, $sum returns 0 for that field. If all operands are non-numeric, $sum returns 0 .

What are the differences between using aggregate () and find () MongoDB?

With aggregate + $match, you get a big monolithic BSON containing all matching documents. With find, you get a cursor to all matching documents. Then you can get each document one by one.

What is $match in MongoDB?

$match takes a document that specifies the query conditions. The query syntax is identical to the read operation query syntax; i.e. $match does not accept raw aggregation expressions. Instead, use a $expr query expression to include aggregation expression in $match .

Is aggregate faster than find in MongoDB?

Because of this, if you have a simple aggregation pipeline or one which does not cut down the data volume much it can often be quicker to use a find() and perform the aggregation client side.


1 Answers

You can use the aggregation framework to get sales and profit and any other value you may be storing in your key/value pair representation.

For your example data:

var pipeline = [
    {
        "$unwind" : "$finance"
    },
    {
        "$group" : {
            "_id" : "$finance.k",
            "numberOf" : {
                "$sum" : 1
            },
            "total" : {
                "$sum" : "$finance.v.v"
            }
        }
    }
]

R = db.tb.aggregate( pipeline );
printjson(R);
{
        "result" : [
            {
                "_id" : "profit",
                "numberOf" : 2,
                "total" : 246246
            },
            {
                "_id" : "sales",
                "numberOf" : 2,
                "total" : 468000
            }
        ],
        "ok" : 1
}

If you have additional k/v pairs then you can add a match which only passes through k values in ["sales","profit"].

like image 95
Asya Kamsky Avatar answered Sep 20 '22 23:09

Asya Kamsky