Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Random document in ElasticSearch

Is there a way to get a truly random sample from an elasticsearch index? i.e. a query that retrieves any document from the index with probability 1/N (where N is the number of documents currently indexed)?

And as a follow-up question: if all documents have some numeric field s, is there a way to get a document through weighted random sampling, i.e. where the probability to get document i with value s_i is equal to s_i / sum(s_j for j in index)?

like image 873
mitchus Avatar asked Sep 17 '14 10:09

mitchus


2 Answers

I know it is an old question, but now it is possible to use random_score, with the following search query:

{    "size": 1,    "query": {       "function_score": {          "functions": [             {                "random_score": {                   "seed": "1477072619038"                }             }          ]       }    } } 

For me it is very fast with about 2 million documents.

I use current timestamp as seed, but you can use anything you like. The best is if you use the same seed, you will get the same results. So you can use your user's session id as seed and all users will have different order.

like image 109
Adam Wallner Avatar answered Sep 22 '22 15:09

Adam Wallner


The only way I know of to get random documents from an index (at least in versions <= 1.3.1) is to use a script:

sort: {   _script: {     script: "Math.random() * 200000",     type: "number",     params: {},     order: "asc"  } } 

You can use that script to make some weighting based on some field of the record.

It's possible that in the future they might add something more complicated, but you'd likely have to request that from the ES team.

like image 36
Alcanzar Avatar answered Sep 23 '22 15:09

Alcanzar