Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Random Sampling from Mongo

I have a mongo collection with documents. There is one field in every document which is 0 OR 1. I need to random sample 1000 records from the database and count the number of documents who have that field as 1. I need to do this sampling 1000 times. How do i do it ?

like image 927
Aditya Singh Avatar asked Sep 30 '12 20:09

Aditya Singh


2 Answers

For people coming to the answer, you should now use the new $sample aggregation function, new in 3.2.

https://docs.mongodb.org/manual/reference/operator/aggregation/sample/

db.collection_of_things.aggregate(
   [ { $sample: { size: 15 } } ]
)

Then add another step to count up the 0s and 1s using $group to get the count. Here is an example from the MongoDB docs.

like image 125
dalanmiller Avatar answered Oct 23 '22 22:10

dalanmiller


For MongoDB 3.0 and before, I use an old trick from SQL days (which I think Wikipedia use for their random page feature). I store a random number between 0 and 1 in every object I need to randomize, let's call that field "r". You then add an index on "r".

db.coll.ensureIndex(r: 1);

Now to get random x objects, you use:

var startVal = Math.random();
db.coll.find({r: {$gt: startVal}}).sort({r: 1}).limit(x);

This gives you random objects in a single find query. Depending on your needs, this may be overkill, but if you are going to be doing lots of sampling over time, this is a very efficient way without putting load on your backend.

like image 34
Nic Cottrell Avatar answered Oct 23 '22 23:10

Nic Cottrell