I have a Mongodb collection. Simply, it has two columns: user and url. It has 39274590 rows. The key of this table is {user, url}.
Using Java, I try to list distinct urls:
MongoDBManager db = new MongoDBManager( "Website", "UserLog" );
return db.getDistinct("url");
But I receive an exception:
Exception in thread "main" com.mongodb.CommandResult$CommandFailure: command failed [distinct]:
{ "serverUsed" : "localhost/127.0.0.1:27017" , "errmsg" : "exception: distinct too big, 16mb cap" , "code" : 10044 , "ok" : 0.0}
How can I solve this problem? Is there any plan B that can avoid this problem?
In version 2.6 you can use the aggregate commands to produce a separate collection: http://docs.mongodb.org/manual/reference/operator/aggregation/out/
This will get around mongodb's limit of 16mb for most queries. You can read more about using the aggregation framework on large datasets in mongodb 2.6 here: http://vladmihalcea.com/mongodb-2-6-is-out/
To do a 'distinct' query with the aggregation framework, group by the field.
db.userlog.aggregate([{$group: {_id: '$url'} }]);
Note: I don't know how this works for the Java driver, good luck.
Take a look at this answer
1) The easiest way to do this is via the aggregation framework. This takes two "$group" commands: the first one groups by distinct values, the second one counts all of the distinct values
2) If you want to do this with Map/Reduce you can. This is also a two-phase process: in the first phase we build a new collection with a list of every distinct value for the key. In the second we do a count() on the new collection.
Note that you cannot return the result of the map/reduce inline, because that will potentially overrun the 16MB document size limit. You can save the calculation in a collection and then count() the size of the collection, or you can get the number of results from the return value of mapReduce().
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With