This is a question about the best way to add up a series of data in an array where I have to match another element. I'm trying to use the 2.2 Aggregation framework and it's possible I can do this with a simple group.
So for a given set of documents I'm trying to get an output like this;
{
"result" : [
{
"_id" : null,
"numberOf": 2,
"Sales" : 468000,
"profit" : 246246,
}
],
"ok" : 1
}
Now, I originally had a list of documents, containing values assigned to named properties, eg;
[
{
_id : 1,
finance: {
sales: 234000,
profit: 123123,
}
}
,
{
_id : 2,
finance: {
sales: 234000,
profit: 123123,
}
}
]
This was easy enough to add up, but the structure didn't work for other reasons. For instance, there are may other columns like "finance" and I want to be able to index them without creating thousands of indexes, so I need to convert to a structure like this;
[
{
_id : 1,
finance: [
{
"k": "sales",
"v": {
"description":"sales over the year",
v: 234000,
}
},
{
"k": "profit",
"v": {
"description":"money made from sales",
v: 123123,
}
}
]
}
,
{
_id : 2,
finance: [
{
"k": "sales",
"v": {
"description":"sales over the year",
v: 234000,
}
},
{
"k": "profit",
"v": {
"description": "money made from sales",
v: 123123,
}
}
]
}
]
I can index finance.k if I want, but then I'm struggling to build an aggregate query to add up all the numbers matching a particular key. This was the reason I originally went for named properties, but this really needs to work in a situation whereby there are thousands of "k" labels.
Does anyone know how to build an aggregate query for this using the new framework? I've tried this;
db.projects.aggregate([
{
$match: {
// QUERY
$and: [
// main query
{},
]
}
},
{
$group: {
_id: null,
"numberOf": { $sum: 1 },
"sales": { $sum: "$finance.v.v" },
"profit": { $sum: "$finance.v.v" },
}
},
])
but I get;
{
"errmsg" : "exception: can't convert from BSON type Array to double",
"code" : 16005,
"ok" : 0
}
** For extra kudos, I'll need to be able to do this in a MapReduce query as well.
If used on a field that contains both numeric and non-numeric values, $sum ignores the non-numeric values and returns the sum of the numeric values. If used on a field that does not exist in any document in the collection, $sum returns 0 for that field. If all operands are non-numeric, $sum returns 0 .
With aggregate + $match, you get a big monolithic BSON containing all matching documents. With find, you get a cursor to all matching documents. Then you can get each document one by one.
$match takes a document that specifies the query conditions. The query syntax is identical to the read operation query syntax; i.e. $match does not accept raw aggregation expressions. Instead, use a $expr query expression to include aggregation expression in $match .
Because of this, if you have a simple aggregation pipeline or one which does not cut down the data volume much it can often be quicker to use a find() and perform the aggregation client side.
You can use the aggregation framework to get sales and profit and any other value you may be storing in your key/value pair representation.
For your example data:
var pipeline = [
{
"$unwind" : "$finance"
},
{
"$group" : {
"_id" : "$finance.k",
"numberOf" : {
"$sum" : 1
},
"total" : {
"$sum" : "$finance.v.v"
}
}
}
]
R = db.tb.aggregate( pipeline );
printjson(R);
{
"result" : [
{
"_id" : "profit",
"numberOf" : 2,
"total" : 246246
},
{
"_id" : "sales",
"numberOf" : 2,
"total" : 468000
}
],
"ok" : 1
}
If you have additional k/v pairs then you can add a match which only passes through k values in ["sales","profit"].
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With