Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fetching a random record from the Google App Engine Datastore?

I have a datastore with around 1,000,000 entities in a model. I want to fetch 10 random entities from this.

I am not sure how to do this? can someone help?

like image 364
demos Avatar asked Jun 09 '10 03:06

demos


2 Answers

Assign each entity a random number and store it in the entity. Then query for ten records whose random number is greater than (or less than) some other random number.

This isn't totally random, however, since entities with nearby random numbers will tend to show up together. If you want to beat this, do ten queries based around ten random numbers, but this will be less efficient.

like image 170
Jason Hall Avatar answered Nov 19 '22 11:11

Jason Hall


Jason Hall's answer and the one here aren't horrible, but as he mentions, they are not really random either. Even doing ten queries will not be random if, for example, the random numbers are all grouped together. To keep things truly random, here are two possible solutions:

Solution 1

Assign an index to each datastore object, keep track of the maximum index, and randomly select an index every time you want to get a random record:

MyObject.objects.filter('index =', random.randrange(0, maxindex+1))

Upside: Truly random. Fast.

Down-side: You have to properly maintain indices when adding and deleting objects, which can make both operations a O(N) operation.

Solution 2

Assign a random number to each datastore number when it is created. Then, to get a random record the first time, query for a record with a random number greater than some other random number and order by the random numbers (i.e. MyObject.order('rand_num').filter('rand_num >=', random.random())). Then save that query as a cursor in the memcache. To get a random record after the first time, load the cursor from the memcache and go to the next item. If there is no item after the first, run the query again.

To prevent the sequence of objects from repeating, on every datastore read, give the entity you just read a new random number and save it back to the datastore.

Up-side: Truly random. No complex indices to maintain.

Down-side: Need to keep track of a cursor. Need to do a put every time you get a random record.

like image 37
speedplane Avatar answered Nov 19 '22 11:11

speedplane