Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count tags in tag array in a mongodb document inside a collection

I have a mongo collection of tweets and each document looks like this:

{ _id:'1234567', 
date:'9/27/08 3:21', 
tweet:'Some text here and some #hashtag and one more #hashtag2', 
a_name:'name', 
a_nick:'nick', 
hashtags:['#hashtag' , '#hashtag2' ]} 

I need to count all the occurrences of #hashtag and #hashtag2 in all of the entries using aggregation. So far I have something like this:

 db.tweets.aggregate(
 { $project: { hashtags:1}},
 { $unwind: "$hashtags" },
 { $group: { _id: "hashtags", count: { $sum: 1 }}}
 );

But that is bringing me the count of all of the hashtags. If I remove the $group line I get a list with all the separate hash tags, which is good, but I would like to be able to count them using $aggregation and mongo only. Any ideas?

like image 791
lesolorzanov Avatar asked Apr 08 '14 01:04

lesolorzanov


People also ask

Can we use count with aggregate function in MongoDB?

MongoDB aggregate $count element in array For this, MongoDB provides the $size aggregation to count and returns the total number of items in an array. Let's get understand this with the help of an example. Example: The subsequent documents were inserted into the Test collection.

How can I find the value of an array of objects in MongoDB?

To search the array of object in MongoDB, you can use $elemMatch operator. This operator allows us to search for more than one component from an array object.


1 Answers

I think that you probably just have a typing mistake or otherwise a misunderstanding:

db.tweets.aggregate([
   { "$project": { "hashtags":1 }},  
   { "$unwind": "$hashtags" },  
   { "$group": { "_id": "$hashtags", "count": { "$sum": 1 } }}  
])

So the value for _id in the group needs to the "$hashtags" rather than the "hashtags" you have used. This is so it uses the actual value of the field, and the result is the count of each "hashtag".

Without the $ to declare that you want the value of the field, it is just a string. So grouping on an unmatched string groups everything.

So that would give you the count for each tag. If in fact you are looking for the total number of "unique" tags without listing each tag. You can modifiy like this:

db.tweets.aggregate([
   { "$project": { "hashtags":1 }},  
   { "$unwind": "$hashtags" },  
   { "$group": { "_id": "$hashtags" }},
   { "$group": { "_id": null, "count": { "$sum": 1 } }
])

So that just summarizes. There is another way to do this using the $addToSet operator, but it really just creates additional work in the pipeline and is not the best usage case for that operator. But just for reference:

db.tweets.aggregate([
   { "$project": { "hashtags":1 }},  
   { "$unwind": "$hashtags" },  
   { "$group": { 
       "_id": null, 
       "hashtags": { "$addToSet": "$hashtags" }
   }},
   { "$unwind": "$hashtags" },
   { "$group": { "_id": null, "count": { "$sum": 1 } }
])
like image 141
Neil Lunn Avatar answered Sep 20 '22 14:09

Neil Lunn