I'm a total newbie to MongoDB. I work on a privat project for my golf club to analyze the round.
I use meteorJS for the Application and tried some aggregation on the command line. But I'm not sure if I even have the right point to the task
A sample document:
{
"_id" : "2KasYR3ytsaX8YuoT",
"course" : {
"id" : "rHmYJBhRtSt38m68s",
"name" : "CourseXYZ"
},
"player" : {
"id" : "tdaYaSvXJueDq4oTN",
"firstname" : "Just",
"lastname" : "aPlayer"
},
"event" : "Training Day",
"tees" : [
{
"tee" : 1,
"par" : 4,
"fairway" : "straight",
"greenInRegulation" : true,
"putts" : 3,
"strokes" : 5
},
{
"tee" : 2,
"par" : 5,
"fairway" : "right",
"greenInRegulation" : true,
"putts" : 2,
"strokes" : 5
},
{
"tee" : 3,
"par" : 5,
"fairway" : "right",
"greenInRegulation" : false,
"shotType": "bunker",
"putts" : 2,
"strokes" : 5
}
]
}
My attempt so far:
db.analysis.aggregate([
{$unwind: "$tees"},
{$group: {
_id:"$player.id",
strokes: {$sum: "$tees.strokes"},
par: {$sum: "$tees.par"},
putts: {$sum: "$tees.putts"},
teesPlayed: {$sum:1}
}}
])
And what I want for a result
{
"_id" : "tdaYaSvXJueDq4oTN",
"strokes" : 15,
"par" : 14,
"putts" : 7,
"teesPlayed" : 3
// here comes what I want to add:
"fairway.straight": 1 // where tees.fairway equals "straight"
"fairway.right": 2 // where tees.fraiway equals "right" (etc.)
"shotType.bunker": 1 // where shotType equals "bunker" etc.
}
There are a few ways of approaching this depending on your overall needs and which MongoDB server version you have available as a target for your project.
Whilst "meteor" installations and default project setups do not "bundle" a MongoDB 3.2 instance, there is no need why your project cannot use such an instance as an external target. If it's a new project to get off the ground, then I would highly recommend working against the latest version available. And maybe even possibly against latest development releases, depending on your own targeted release cycle. Work with what is most fresh, and your application will be too.
For that reason, we start with the latest at the top of the list.
The big feature in MongoDB 3.2 that makes it really stand out here in terms of performance is a change in how $sum
operates. Previously just as an accumulator operator for $group
this would work on singular numeric values to produce a total.
The big improvement is hidden within the $project
stage usage which is added, where $sum
can be directly applied to an array of values. i.e { "$sum": [1,2,3] }
results in 6
. So now you can "nest" the operations with anything that produces an array of values from a source. Most notably here is $map
:
db.analysis.aggregate([
{ "$group": {
"_id": "$player.id",
"strokes": {
"$sum": {
"$sum": {
"$map": {
"input": "$tees",
"as": "tee",
"in": "$$tee.strokes"
}
}
}
},
"par": {
"$sum": {
"$sum": {
"$map": {
"input": "$tees",
"as": "tee",
"in": "$$tee.par"
}
}
}
},
"putts": {
"$sum": {
"$sum": {
"$map": {
"input": "$tees",
"as": "tee",
"in": "$$tee.putts"
}
}
}
},
"teesPlayed": { "$sum": { "$size": "$tees" } },
"shotsRight": {
"$sum": {
"$size": {
"$filter": {
"input": "$tees",
"as": "tee",
"cond": { "$eq": [ "$$tee.fairway", "right" ] }
}
}
}
},
"shotsStraight": {
"$sum": {
"$size": {
"$filter": {
"input": "$tees",
"as": "tee",
"cond": { "$eq": [ "$$tee.fairway", "straight" ] }
}
}
}
},
"bunkerShot": {
"$sum": {
"$size": {
"$filter": {
"input": "$tees",
"as": "tee",
"cond": { "$eq": [ "$$tee.shotType", "bunker" ] }
}
}
}
}
}}
])
So here each field is split out by either doing the double $sum
trick on the single field values from the array items, or in contrast the arrays are being processed with $filter
to just restrict to matching items and processed for lenght of matches with $size
, for the result fields that rather want "counts".
Though this looks long winded in pipeline construction it will yield the fasted results. And though you need to specify all of the keys to result with the associated logic, there is nothing stopping "generation" of the data structure necessary for the pipeline as the result of other queries on the data set.
Of course not every project can practically use the latest version of things. So before a MongoDB 3.2 release that introduced some of the operators used above, the only real practical way to work with array data and conditionally work with different elements and sums was to process first with $unwind
.
So essentially we start with the query you began to construct, but then add in the handling for the different fields:
db.analysis.aggregate([
{ "$unwind": "$tees" },
{ "$group": {
"_id": "$player.id",
"strokes": { "$sum": "$tees.strokes" },
"par": { "$sum": "$tees.par" },
"putts": { "$sum": "$tees.putts" },
"teedsPlayed": { "$sum": 1 },
"shotsRight": {
"$sum": {
"$cond": [
{ "$eq": [ "$tees.fairway", "right" ] },
1,
0
]
}
},
"shotsStraight": {
"$sum": {
"$cond": [
{ "$eq": [ "$tees.fairway", "straight" ] },
1,
0
]
}
},
"bunkerShot": {
"$sum": {
"$cond": [
{ "$eq": [ "$tees.shotType", "bunker" ] },
1,
0
]
}
}
}}
])
So you should note that there is still "some" similarity to the first listing, in that where the $filter
statements all have some logic within there "cond"
argument, that logic is rather transposed to the $cond
operator here.
As a "ternary" operator ( if/then/else) , it's job it is to evaluate a logical condition (if) and either return the next argument where that condition was true
(then) or otherwise return the last argument where it is false
(else). In this case either 1
or 0
depending on whether the tested condition matched. This gives the "counts" to $sum
as is required.
In either statement, the produced results come out like this:
{
"_id" : "tdaYaSvXJueDq4oTN",
"strokes" : 15,
"par" : 14,
"putts" : 7,
"teesPlayed" : 3,
"shotsRight" : 2,
"shotsStraight" : 1,
"bunkerShot" : 1
}
Since this is an aggregate statement with $group
, then one rule is that the "keys" ( apart from needing to be specified in the constructed statement ) must be in the "top-level" of the structure. So no "nested" structures are allowed within a $group
, hence the whole names for each key.
If you really must transform, then you can by adding a $project
stage following the $group
in each example:
{ "$project": {
"strokes": 1,
"par": 1,
"putts": 1,
"teesPlayed": 1,
"fairway": {
"straight": "$shotsStraight",
"right": "$shotsRight"
},
"shotType": {
"bunker": "$bunkerShot"
}
}}
So a bit of "re-shaping" can be done, but of course all the names and structure must be specified, though again you could in theory just generate this all in code. It is just a data structure after all.
The bottom line here is that $unwind
adds cost, and quite a lot of cost. It is basically going to add a copy of each document in the pipeline for processing "per" every array element contained in each document. So not is there only the cost of processing all of those produced things, but also a cost of "producing" them in the first place.
And finally as an approach
db.analysis.mapReduce(
function() {
var data = { "strokes": 0 ,"par": 0, "putts": 0, "teesPlayed": 0, "fairway": {} };
this.tees.forEach(function(tee) {
// Increment common values
data.strokes += tee.strokes;
data.par += tee.par;
data.putts += tee.putts;
data.teesPlayed++;
// Do dynamic keys
if (!data.fairway.hasOwnProperty(tee.fairway))
data.fairway[tee.fairway] = 0;
data.fairway[tee.fairway]++;
if (tee.hasOwnProperty('shotType')) {
if (!data.hasOwnProperty('shotType'))
data.shotType = {};
if (!data.shotType.hasOwnProperty(tee.shotType))
data.shotType[tee.shotType] = 0;
data.shotType[tee.shotType]++
}
});
emit(this.player.id,data);
},
function(key,values) {
var data = { "strokes": 0 ,"par": 0, "putts": 0, "teesPlayed": 0, "fairway": {} };
values.forEach(function(value) {
// Common keys
data.strokes += value.strokes;
data.par += value.par;
data.putts += value.putts;
data.teesPlayed += value.teesPlayed;
Object.keys(value.fairway).forEach(function(fairway) {
if (!data.fairway.hasOwnProperty(fairway))
data.fairway[fairway] = 0;
data.fairway[fairway] += value.fairway[fairway];
});
if (value.hasOwnProperty('shotType')) {
if (!data.hasOwnProperty('shotType'))
data.shotType = {};
Object.keys(value.shotType).forEach(function(shotType) {
if (!data.shotType.hasOwnProperty(shotType))
data.shotType[shotType] = 0;
data.shotType[shotType] += value.shotType[shotType];
});
}
});
return data;
},
{ "out": { "inline": 1 } }
)
And the output from this can be done immediately with the nested structure, but of course in the very mapReduce output form of "key/value", being that "key" is the grouping _id
and "value" contains all the output:
{
"_id" : "tdaYaSvXJueDq4oTN",
"value" : {
"strokes" : 15,
"par" : 14,
"putts" : 7,
"teesPlayed" : 3,
"fairway" : {
"straight" : 1,
"right" : 2
},
"shotType" : {
"bunker" : 1
}
}
}
The "out"
options for mapReduce are either the "inline"
as shown here where you can fit all the result in memory ( and within the 16MB BSON limit ), or alternately to another collection from which you can read later. There is a similar $out
for .aggregate()
, but this is generally negated by aggregation output being available as a "cursor", unless of course you really want it in a collection instead.
So it all depends on how you really want to approach this. If speed is of the upmost importance then .aggregate()
is generally going to yield the fastest results. On the other hand if you want to work "dynamically" with the produced "keys" then mapReduce
allows the logic to be generally self contained, without the need for another inspection pass to generate the required aggregation pipeline statement.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With