So I want to use pipeline aggregation in MongoDB to query certain values from documents and then add them together.
My "Albums" document.
{
"_id" : ObjectId("5875ed1dc939408da0601f31"),
"AlbumName" : "Blurryface",
"Artist" : "21 Pilots",
"Date" : "20151110",
"Label" : "Fueled By Ramen",
"Writers" : "Tyler Joseph",
"Producer" : "Mike Elizondo",
"Songlist" : [
{
"_id" : ObjectId("5875e5e8c939408da0601d73"),
"SongID" : "1",
"SongName" : "Stressed Out",
"Artist" : "21 Pilots",
"Album" : "Blurryface",
"Duration:" : "200",
"nPlays" : 800000000,
"SongDataFile" : "data"
},
{
"_id" : ObjectId("5875e855c939408da0601dcc"),
"SongID" : "4",
"SongName" : "Heathens",
"Artist" : "21 Pilots",
"Album" : "Blurryface",
"Colaborator" : "NA",
"Duration:" : "320",
"nPlays" : 5000000,
"SongDataFile" : "data"
}
]
}
How can I make an aggregation pipeline that extracts the "nPlays" from the songs in the array and then add them together?
I'm asking here since the documentation on MongoDB is subpar and they have no examples of how to use the operators together. Add to this that all examples on google only query for $gt $lt or use the same example that only uses $match and $group which doesn't help me with my problem at all.
In short:
How do I extract "nPlays" and add them together in a pipeline aggregation?
You have to unwind the internal documents. This operation creates a document for each subdocument in Songlist field.
The resulting aggregation pipeline is the following:
db.Albums.aggregate([
{$unwind: {path: "$Songlist"}},
{$project : { "_id" : 0, "AlbumName" : 1, "Songlist.nPlays" : 1} },
{$group : {"_id" : "$AlbumName", "sum" : {"$sum" : "$Songlist.nPlays"}}}
])
The result document is this:
{
"_id" : "Blurryface",
"sum" : 805000000
}
In summary, with the $unwind operation you flatten inner subdocuments. Then, with a simple $project you can retain only the fields you need (this stage is optional). Finally, using a $group, you can sum over the information you need.
Hope it helps.
For the most efficient solution which does not need multiple pipelines, I would suggest bumping your MongoDB server to 3.4 (if using earlier versions), and use the new $reduce array operator to add the fields' values in the Songlist array in a seamless manner.
It calculates the sum of the "Songlist.nPlays" fields in the array by applying an expression to each element in an array and combining them into a single value.
You can then use this as an expression with the $addFields pipeline to get the desired field along with the other fields:
db.collection.aggregate([
{
"$addFields": {
"totalPlayDuration": {
"$reduce": {
"input": "$Songlist",
"initialValue": 0,
"in": { "$add": ["$$value", "$$this.nPlays"] }
}
}
}
}
])
Sample Output
/* 1 */
{
"_id" : ObjectId("5875ed1dc939408da0601f31"),
"AlbumName" : "Blurryface",
"Artist" : "21 Pilots",
"Date" : "20151110",
"Label" : "Fueled By Ramen",
"Writers" : "Tyler Joseph",
"Producer" : "Mike Elizondo",
"Songlist" : [
{
"_id" : ObjectId("5875e5e8c939408da0601d73"),
"SongID" : "1",
"SongName" : "Stressed Out",
"Artist" : "21 Pilots",
"Album" : "Blurryface",
"Duration:" : "200",
"nPlays" : 800000000,
"SongDataFile" : "data"
},
{
"_id" : ObjectId("5875e855c939408da0601dcc"),
"SongID" : "4",
"SongName" : "Heathens",
"Artist" : "21 Pilots",
"Album" : "Blurryface",
"Colaborator" : "NA",
"Duration:" : "320",
"nPlays" : 5000000,
"SongDataFile" : "data"
}
],
"totalPlayDuration": 805000000
}
NB:
A solution that uses $unwind may not be as efficient at scale and expect drop in performance when dealing with large arrays because it produces a cartesian product of the documents i.e. a copy of each document per array entry, which uses more memory (possible memory cap on aggregation pipelines of 10% total memory) and therefore takes time to produce as well processing the documents during the flattening process.
Also, a multiple pipeline solution requires knowledge of the document fields since this is needed in the $group pipeline where you retain the fields in the grouping process by using the accumulators like $first or $last. That can be a huge limitation if your query needs to be dynamic. So in essence it would be more beneficial to take advantage of the new operators found in MongoDB versions 3.4 and above which offer improved aggregation pipeline performance.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With