I have a mongo collection of tweets and each document looks like this:
{ _id:'1234567',
date:'9/27/08 3:21',
tweet:'Some text here and some #hashtag and one more #hashtag2',
a_name:'name',
a_nick:'nick',
hashtags:['#hashtag' , '#hashtag2' ]}
I need to count all the occurrences of #hashtag and #hashtag2 in all of the entries using aggregation. So far I have something like this:
db.tweets.aggregate(
{ $project: { hashtags:1}},
{ $unwind: "$hashtags" },
{ $group: { _id: "hashtags", count: { $sum: 1 }}}
);
But that is bringing me the count of all of the hashtags. If I remove the $group
line I get a list with all the separate hash tags, which is good, but I would like to be able to count them using $aggregation
and mongo only. Any ideas?
MongoDB aggregate $count element in array For this, MongoDB provides the $size aggregation to count and returns the total number of items in an array. Let's get understand this with the help of an example. Example: The subsequent documents were inserted into the Test collection.
To search the array of object in MongoDB, you can use $elemMatch operator. This operator allows us to search for more than one component from an array object.
I think that you probably just have a typing mistake or otherwise a misunderstanding:
db.tweets.aggregate([
{ "$project": { "hashtags":1 }},
{ "$unwind": "$hashtags" },
{ "$group": { "_id": "$hashtags", "count": { "$sum": 1 } }}
])
So the value for _id
in the group needs to the "$hashtags"
rather than the "hashtags"
you have used. This is so it uses the actual value of the field, and the result is the count of each "hashtag".
Without the $
to declare that you want the value of the field, it is just a string. So grouping on an unmatched string groups everything.
So that would give you the count for each tag. If in fact you are looking for the total number of "unique" tags without listing each tag. You can modifiy like this:
db.tweets.aggregate([
{ "$project": { "hashtags":1 }},
{ "$unwind": "$hashtags" },
{ "$group": { "_id": "$hashtags" }},
{ "$group": { "_id": null, "count": { "$sum": 1 } }
])
So that just summarizes. There is another way to do this using the $addToSet
operator, but it really just creates additional work in the pipeline and is not the best usage case for that operator. But just for reference:
db.tweets.aggregate([
{ "$project": { "hashtags":1 }},
{ "$unwind": "$hashtags" },
{ "$group": {
"_id": null,
"hashtags": { "$addToSet": "$hashtags" }
}},
{ "$unwind": "$hashtags" },
{ "$group": { "_id": null, "count": { "$sum": 1 } }
])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With