Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Random order & pagination Elasticsearch

In this issue is a feature request for ordering with optional seed allowing for recreation of random order.

I need to be able to paginate random ordered results. How could this be be done with Elasticsearch 0.19.1 ?

Thanks.

like image 788
Yeggeps Avatar asked Mar 20 '12 23:03

Yeggeps


People also ask

What is random order?

Random order refers to the randomization of the order in which questions appear in a questionnaire. The purpose is to overcome a type of measurement error known as context effects.

What is random sequence of number?

Random number generation is a process by which, often by means of a random number generator (RNG), a sequence of numbers or symbols that cannot be reasonably predicted better than by random chance is generated.

Why do we randomize orders?

The main benefit of randomization is that you don't have to worry about confounders potentially mucking up your experimental results. If serious confounders are present, then the inference is not valid (biased estimates, high p -values, wide confidence intervals, etc.).


2 Answers

This should be considerably faster than both answers above and supports seeding:

curl -XGET 'localhost:9200/_search' -d '{   "query": {     "function_score" : {       "query" : { "match_all": {} },       "random_score" : {}     }   } }'; 

See: https://github.com/elasticsearch/elasticsearch/issues/1170

like image 51
Nariman Avatar answered Sep 27 '22 16:09

Nariman


You can sort using a hash function of a unique field (for example id) and a random salt. Depending on how truly random the results should be, you can do something as primitive as:

{   "query" : { "query_string" : {"query" : "*:*"} },   "sort" : {     "_script" : {          "script" : "(doc['_id'].value + salt).hashCode()",         "type" : "number",         "params" : {             "salt" : "some_random_string"         },         "order" : "asc"     }   } } 

or something as sophisticated as

{   "query" : { "query_string" : {"query" : "*:*"} },   "sort" : {     "_script" : {          "script" : "org.elasticsearch.common.Digest.md5Hex(doc['_id'].value + salt)",         "type" : "string",         "params" : {             "salt" : "some_random_string"         },         "order" : "asc"     }   } } 

The second example will produce more random results but will be somewhat slower.

For this approach to work the field _id has to be stored. Otherwise, the query will fail with NullPointerException.

like image 41
imotov Avatar answered Sep 27 '22 17:09

imotov