I need to pick a document from a collection at random (alternatively - a small number of successive documents from a randomly-positioned "window").
I've found two solutions: 1 and 2. The first is unacceptable since I anticipate large collection size and wish to minimize the document size. The second seems ineffective (I'm not sure about the complexity of skip
operation). And here one can find a mention of querying a document with a specified index, but I don't know how to do it (I'm using C++ driver).
Are there other solutions to the problem? Which is the most efficient?
I had a similar issue once. In my case, I had a date property on my documents. I knew the earliest date possible in the dataset so in my application code, I would generate a random date within the range of EARLIEST_DATE_IN_SET and NOW and then query mongodb using a GTE query on the date property and simply limit it to 1 result.
There was a small chance that the random date would be greater than the highest date in the data set, so i accounted for that in the application code.
With an index on the date property, this was a super fast query.
It seems like you could mold solution 1 there, (assuming your _id key was an auto-inc value), then just do a count on your records, and use that as the upper limit for a random int in c++, then grab that row.
Likewise, if you don't have an autoinc _id key, just create one with your results.. having an additional field with an INT shouldn't add that much to your document size.
If you don't have an auto-inc field Mongo talks about how to quickly add one here:
Auto Inc Field.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With