I want single random document from mongoDB collection. Now my mongoDB collection contains more then 1 billion collections. How to get single random document from that collection ?
To clone a document, hover over the desired document and click the Clone button. When you click the Clone button, Compass opens the document insertion dialog with the same schema and values as the cloned document. You can edit any of these fields and values before you insert the new document.
On the MongoDB shell you can do: db. collectionName.
You can select a single field in MongoDB using the following syntax: db. yourCollectionName. find({"yourFieldName":yourValue},{"yourSingleFieldName":1,_id:0});
I never worked with MongoDB from Python, but there is a general solution for your problem. Here is a MongoDB shell script for obtaining single random document:
N = db.collection.count(condition)
db.collection.find(condition).limit(1).skip(Math.floor(Math.random()*N))
condition
here is a MongoDB query. If you want to query an entire collection, use query = null
.
It's a general solution, so it works with any MongoDB driver.
I ran a benchmark to test several implementations. First, I created test collection with 5567249 documents with indexed random field rnd
.
I chose three methods to compare with each other:
First method:
db.collection.find().limit(1).skip(Math.floor(Math.random()*N))
Second method:
db.collection.find({rnd: {$gte: Math.random()}}).sort({rnd:1}).limit(1)
Third method:
db.collection.findOne({rnd: {$gte: Math.random()}})
I ran each method 10 times and got its average computing time:
method 1: 882.1 msec
method 2: 1.2 msec
method 3: 0.6 msec
This benchmark shows that my solution not the fastest one.
But the third solution is not a good one either, because it finds the first element in database (sorted in natural order) with rnd > random()
. So, its output not truly random.
I think that second method is the best one for frequent usage. But it has one defect: it requires altering the whole database and ensuring additional index.
Add an additional column named random
to your collection and make that the value in it is between 0 to 1. You can assign random floating points between 0 to 1 into this column for each record via [random.random() for _ in range(0, 10)]
.
Then:-
import random
collection = mongodb["collection_name"]
rand = random.random() # rand will be a floating point between 0 to 1.
random_record = collection.find_one({ 'random' => { '$gte' => rand } })
MongoDB will have its native implementation in due course. Filed feature here - https://jira.mongodb.org/browse/SERVER-533
Not yet implemented at time of writing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With