I have gone through several articles and examples, and have yet to find an efficient way to do this SQL query in MongoDB (where there are millions of <del>rows</del> documents) First attempt (e.g. from this almost duplicate question - Mongo equivalent of SQL's SELECT DISTINCT?) <pre class="prettyprint"><code>db.myCollection.distinct("myIndexedNonUniqueField").length </code></pre> Obviously I got this error as my dataset is huge <pre class="prettyprint"><code>Thu Aug 02 12:55:24 uncaught exception: distinct failed: { "errmsg" : "exception: distinct too big, 16mb cap", "code" : 10044, "ok" : 0 } </code></pre> Second attempt I decided to try and do a group <pre class="prettyprint"><code>db.myCollection.group({key: {myIndexedNonUniqueField: 1}, initial: {count: 0}, reduce: function (obj, prev) { prev.count++;} } ); </code></pre> But I got this error message instead: <pre class="prettyprint"><code>exception: group() can't handle more than 20000 unique keys </code></pre> Third attempt I haven't tried yet but there are several suggestions that involve <code>mapReduce</code> e.g. <ul> <li>this one how to do distinct and group in mongodb? (not accepted, answer author / OP didn't test it)</li> <li>this one MongoDB group by Functionalities (seems similar to Second Attempt)</li> <li>this one http://blog.emmettshear.com/post/2010/02/12/Counting-Uniques-With-MongoDB </li> <li>this one https://groups.google.com/forum/?fromgroups#!topic/mongodb-user/trDn3jJjqtE </li> <li>this one http://cookbook.mongodb.org/patterns/unique_items_map_reduce/ </li> </ul> Also It seems there is a pull request on GitHub fixing the <code>.distinct</code> method to mention it should only return a count, but it's still open: https://github.com/mongodb/mongo/pull/34 But at this point I thought it's worth to ask here, what is the latest on the subject? Should I move to SQL or another NoSQL DB for distinct counts? or is there an efficient way? Update: This comment on the MongoDB official docs is not encouraging, is this accurate? http://www.mongodb.org/display/DOCS/Aggregation#comment-430445808 Update2: Seems the new Aggregation Framework answers the above comment... (MongoDB 2.1/2.2 and above, development preview available, not for production) http://docs.mongodb.org/manual/applications/aggregation/

<pre class="prettyprint"><code>db.myCollection.aggregate( {$group : {_id : "$myIndexedNonUniqueField"} }, {$group: {_id:1, count: {$sum : 1 }}}); </code></pre> straight to result: <pre class="prettyprint"><code>db.myCollection.aggregate( {$group : {_id : "$myIndexedNonUniqueField"} }, {$group: {_id:1, count: {$sum : 1 }}}) .result[0].count; </code></pre>

MongoDB select count(distinct x) on an indexed column - count unique results for large data sets

Tags:

mongodb

I have gone through several articles and examples, and have yet to find an efficient way to do this SQL query in MongoDB (where there are millions of ~~rows~~ documents)

First attempt

(e.g. from this almost duplicate question - Mongo equivalent of SQL's SELECT DISTINCT?)

db.myCollection.distinct("myIndexedNonUniqueField").length

Obviously I got this error as my dataset is huge

Thu Aug 02 12:55:24 uncaught exception: distinct failed: {         "errmsg" : "exception: distinct too big, 16mb cap",         "code" : 10044,         "ok" : 0 }

Second attempt

I decided to try and do a group

db.myCollection.group({key: {myIndexedNonUniqueField: 1},                 initial: {count: 0},                   reduce: function (obj, prev) { prev.count++;} } );

But I got this error message instead:

exception: group() can't handle more than 20000 unique keys

Third attempt

I haven't tried yet but there are several suggestions that involve mapReduce

e.g.

this one how to do distinct and group in mongodb? (not accepted, answer author / OP didn't test it)
this one MongoDB group by Functionalities (seems similar to Second Attempt)
this one http://blog.emmettshear.com/post/2010/02/12/Counting-Uniques-With-MongoDB
this one https://groups.google.com/forum/?fromgroups#!topic/mongodb-user/trDn3jJjqtE
this one http://cookbook.mongodb.org/patterns/unique_items_map_reduce/

Also

It seems there is a pull request on GitHub fixing the .distinct method to mention it should only return a count, but it's still open: https://github.com/mongodb/mongo/pull/34

But at this point I thought it's worth to ask here, what is the latest on the subject? Should I move to SQL or another NoSQL DB for distinct counts? or is there an efficient way?

Update:

This comment on the MongoDB official docs is not encouraging, is this accurate?

http://www.mongodb.org/display/DOCS/Aggregation#comment-430445808

Update2:

Seems the new Aggregation Framework answers the above comment... (MongoDB 2.1/2.2 and above, development preview available, not for production)

http://docs.mongodb.org/manual/applications/aggregation/

599

asked Aug 02 '12 17:08

Eran Medan

2 Answers

1) The easiest way to do this is via the aggregation framework. This takes two "$group" commands: the first one groups by distinct values, the second one counts all of the distinct values

pipeline = [      { $group: { _id: "$myIndexedNonUniqueField"}  },     { $group: { _id: 1, count: { $sum: 1 } } } ];  // // Run the aggregation command // R = db.runCommand(      {     "aggregate": "myCollection" ,      "pipeline": pipeline     } ); printjson(R);

2) If you want to do this with Map/Reduce you can. This is also a two-phase process: in the first phase we build a new collection with a list of every distinct value for the key. In the second we do a count() on the new collection.

var SOURCE = db.myCollection; var DEST = db.distinct DEST.drop();   map = function() {   emit( this.myIndexedNonUniqueField , {count: 1}); }  reduce = function(key, values) {   var count = 0;    values.forEach(function(v) {     count += v['count'];        // count each distinct value for lagniappe   });    return {count: count}; };  // // run map/reduce // res = SOURCE.mapReduce( map, reduce,      { out: 'distinct',       verbose: true     }     );  print( "distinct count= " + res.counts.output ); print( "distinct count=", DEST.count() );

Note that you cannot return the result of the map/reduce inline, because that will potentially overrun the 16MB document size limit. You can save the calculation in a collection and then count() the size of the collection, or you can get the number of results from the return value of mapReduce().

102

answered Sep 23 '22 03:09

William Z

db.myCollection.aggregate(     {$group : {_id : "$myIndexedNonUniqueField"} },     {$group: {_id:1, count: {$sum : 1 }}});

straight to result:

db.myCollection.aggregate(     {$group : {_id : "$myIndexedNonUniqueField"} },     {$group: {_id:1, count: {$sum : 1 }}})    .result[0].count;

answered Sep 19 '22 03:09

Stackee007

Related questions
                            
                                Case insensitive search in Mongo
                            
                                MongoDB querying performance for over 5 million records
                            
                                How does MongoDB index arrays?
                            
                                How to convert a pymongo.cursor.Cursor into a dict?
                            
                                How to start a mongodb shell in docker container?
                            
                                server returned error on SASL authentication step: Authentication failed
                            
                                Storing Enums as strings in MongoDB
                            
                                MongoDB "root" user
                            
                                Meteor app — resetting a deployed app's DB
                            
                                how to query child objects in mongodb
                            
                                group by dates in mongodb
                            
                                MongoDB can't find data directory after upgrading to Mac OS 10.15 (Catalina)
                            
                                Keeping open a MongoDB database connection
                            
                                Mongoid or MongoMapper? [closed]
                            
                                Searching by ObjectId on Mongo Compass
                            
                                Mocking database in node.js?
                            
                                Connection refused to MongoDB errno 111
                            
                                TransactionRequiredException Executing an update/delete query
                            
                                Installing and Running MongoDB on OSX
                            
                                mongo command not recognized when trying to connect to a mongodb server

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With