Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB: what is the most efficient way to query a single random document?

Tags:

c++

mongodb

I need to pick a document from a collection at random (alternatively - a small number of successive documents from a randomly-positioned "window"). I've found two solutions: 1 and 2. The first is unacceptable since I anticipate large collection size and wish to minimize the document size. The second seems ineffective (I'm not sure about the complexity of skip operation). And here one can find a mention of querying a document with a specified index, but I don't know how to do it (I'm using C++ driver).

Are there other solutions to the problem? Which is the most efficient?

like image 927
Violet Giraffe Avatar asked Nov 09 '11 18:11

Violet Giraffe


2 Answers

I had a similar issue once. In my case, I had a date property on my documents. I knew the earliest date possible in the dataset so in my application code, I would generate a random date within the range of EARLIEST_DATE_IN_SET and NOW and then query mongodb using a GTE query on the date property and simply limit it to 1 result.

There was a small chance that the random date would be greater than the highest date in the data set, so i accounted for that in the application code.

With an index on the date property, this was a super fast query.

like image 137
Bryan Migliorisi Avatar answered Oct 01 '22 15:10

Bryan Migliorisi


It seems like you could mold solution 1 there, (assuming your _id key was an auto-inc value), then just do a count on your records, and use that as the upper limit for a random int in c++, then grab that row.

Likewise, if you don't have an autoinc _id key, just create one with your results.. having an additional field with an INT shouldn't add that much to your document size.

If you don't have an auto-inc field Mongo talks about how to quickly add one here:

Auto Inc Field.

like image 21
Petrogad Avatar answered Oct 01 '22 13:10

Petrogad