Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select random row from a PostgreSQL table with weighted row probabilities

Tags:

Example input:

 SELECT * FROM test;  id | percent    ----+----------   1 | 50    2 | 35      3 | 15    (3 rows) 

How would you write such query, that on average 50% of time i could get the row with id=1, 35% of time row with id=2, and 15% of time row with id=3?

I tried something like SELECT id FROM test ORDER BY p * random() DESC LIMIT 1, but it gives wrong results. After 10,000 runs I get a distribution like: {1=6293, 2=3302, 3=405}, but I expected the distribution to be nearly: {1=5000, 2=3500, 3=1500}.

Any ideas?

like image 686
Oleg Golovanov Avatar asked Oct 23 '12 22:10

Oleg Golovanov


People also ask

How do I select a random row in PostgreSQL?

There's an easy way to show a random record in a table: SELECT * FROM table_name ORDER BY RANDOM() LIMIT 1; But this query might take a while to finish as it reads the whole table first then take out a random record from it.

How do I select a record in PostgreSQL?

The PostgreSQL SELECT statement retrieves data from a single or several tables in a database, and returns the data in a result table, called a result-set. Use the SELECT statement to return one or more rows matching the specified criteria from the database tables.


2 Answers

This should do the trick:

WITH CTE AS (     SELECT random() * (SELECT SUM(percent) FROM YOUR_TABLE) R ) SELECT * FROM (     SELECT id, SUM(percent) OVER (ORDER BY id) S, R     FROM YOUR_TABLE CROSS JOIN CTE ) Q WHERE S >= R ORDER BY id LIMIT 1; 

The sub-query Q gives the following result:

1  50 2  85 3  100 

We then simply generate a random number in range [0, 100) and pick the first row that is at or beyond that number (the WHERE clause). We use common table expression (WITH) to ensure the random number is calculated only once.

BTW, the SELECT SUM(percent) FROM YOUR_TABLE allows you to have any weights in percent - they don't strictly need to be percentages (i.e. add-up to 100).

[SQL Fiddle]

like image 83
Branko Dimitrijevic Avatar answered Sep 20 '22 05:09

Branko Dimitrijevic


ORDER BY random() ^ (1.0 / p)

from the algorithm described by Efraimidis and Spirakis.

like image 38
Mechanic Wei Avatar answered Sep 20 '22 05:09

Mechanic Wei