Assuming I have a collection called "posts" (in reality it is a more complex collection, posts is too simple) with the following structure:
> db.posts.find() { "_id" : ObjectId("50ad8d451d41c8fc58000003"), "title" : "Lorem ipsum", "author" : "John Doe", "content" : "This is the content", "tags" : [ "SOME", "RANDOM", "TAGS" ] }
I expect this collection to span hundreds of thousands, perhaps millions, that I need to query for posts by tags and group the results by tag and display the results paginated. This is where the aggregation framework comes in. I plan to use the aggregate() method to query the collection:
db.posts.aggregate([ { "$unwind" : "$tags" }, { "$group" : { _id: { tag: "$tags" }, count: { $sum: 1 } } } ]);
The catch is that to create the paginator I would need to know the length of the output array. I know that to do that you can do:
db.posts.aggregate([ { "$unwind" : "$tags" }, { "$group" : { _id: { tag: "$tags" }, count: { $sum: 1 } } } { "$group" : { _id: null, total: { $sum: 1 } } } ]);
But that would discard the output from previous pipeline (the first group). Is there a way that the two operations be combined while preserving each pipeline's output? I know that the output of the whole aggregate operation can be cast to an array in some language and have the contents counted but there may be a possibility that the pipeline output may exceed the 16Mb limit. Also, performing the same query just to obtain the count seems like a waste.
So is obtaining the document result and count at the same time possible? Any help is appreciated.
The GROUP BY statement is often used with aggregate functions ( COUNT() , MAX() , MIN() , SUM() , AVG() ) to group the result-set by one or more columns.
SQL – count() with Group By clause The count() function is an aggregate function use to find the count of the rows that satisfy the fixed conditions. The count() function with the GROUP BY clause is used to count the data which were grouped on a particular attribute of the table.
Use count() by Column Name groupby() to group the rows by column and use count() method to get the count for each group by ignoring None and Nan values. It works with non-floating type data as well. The below example does the grouping on Courses column and calculates count how many times each value is present.
count() lets you quickly count the unique values of one or more variables: df %>% count(a, b) is roughly equivalent to df %>% group_by(a, b) %>% summarise(n = n()) .
$project
to save tag
and count
into tmp
$push
or addToSet
to store tmp
into your data
list.Code:
db.test.aggregate( {$unwind: '$tags'}, {$group:{_id: '$tags', count:{$sum:1}}}, {$project:{tmp:{tag:'$_id', count:'$count'}}}, {$group:{_id:null, total:{$sum:1}, data:{$addToSet:'$tmp'}}} )
Output:
{ "result" : [ { "_id" : null, "total" : 5, "data" : [ { "tag" : "SOME", "count" : 1 }, { "tag" : "RANDOM", "count" : 2 }, { "tag" : "TAGS1", "count" : 1 }, { "tag" : "TAGS", "count" : 1 }, { "tag" : "SOME1", "count" : 1 } ] } ], "ok" : 1 }
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With