Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB Aggregation: Counting distinct fields

I am trying to write an aggregation to identify accounts that use multiple payment sources. Typical data would be.

{  account:"abc",  vendor:"amazon", }  ... {  account:"abc",  vendor:"overstock", } 

Now, I'd like to produce a list of accounts similar to this

{  account:"abc",  vendorCount:2 } 

How would I write this in Mongo's aggregation framework

like image 767
user1438162 Avatar asked Aug 29 '13 01:08

user1438162


People also ask

How can I count distinct values in MongoDB?

MongoDB has a distinct command which returns an array of distinct values for a field; you can check the length of the array for a count. As noted in the MongoDB documentation: Results must not be larger than the maximum BSON size (16MB).

How do I use distinct aggregation in MongoDB?

You can use $addToSet with the aggregation framework to count distinct objects. Not a generic solution, if you have a large number of unique zip codes per result, this array would be very large.

How do I count multiple fields in MongoDB?

In MongoDB, when we have a large dataset inside the collection and we want to count where the field value is repeating on multiple fields then we use $group aggregation. Example: Here, we are taking an example in which we apply $group aggregation with multiple fields and get the count of duplicate field values.

Is aggregation fast in MongoDB?

On large collections of millions of documents, MongoDB's aggregation was shown to be much worse than Elasticsearch. Performance worsens with collection size when MongoDB starts using the disk due to limited system RAM. The $lookup stage used without indexes can be very slow.


2 Answers

I figured this out by using the $addToSet and $unwind operators.

Mongodb Aggregation count array/set size

db.collection.aggregate([ {     $group: { _id: { account: '$account' }, vendors: { $addToSet: '$vendor'} } }, {     $unwind:"$vendors" }, {     $group: { _id: "$_id", vendorCount: { $sum:1} } } ]); 

Hope it helps someone

like image 118
user1438162 Avatar answered Sep 23 '22 15:09

user1438162


I think its better if you execute query like following which will avoid unwind

db.t2.insert({_id:1,account:"abc",vendor:"amazon"}); db.t2.insert({_id:2,account:"abc",vendor:"overstock"});   db.t2.aggregate( { $group : { _id : { "account" : "$account", "vendor" : "$vendor" }, number : { $sum : 1 } } }, { $group : { _id : "$_id.account", number : { $sum : 1 } } } ); 

Which will show you following result which is expected.

{ "_id" : "abc", "number" : 2 } 
like image 45
Mayur Kataria Avatar answered Sep 23 '22 15:09

Mayur Kataria