Example input:
SELECT * FROM test; id | percent ----+---------- 1 | 50 2 | 35 3 | 15 (3 rows)
How would you write such query, that on average 50% of time i could get the row with id=1, 35% of time row with id=2, and 15% of time row with id=3?
I tried something like SELECT id FROM test ORDER BY p * random() DESC LIMIT 1
, but it gives wrong results. After 10,000 runs I get a distribution like: {1=6293, 2=3302, 3=405}
, but I expected the distribution to be nearly: {1=5000, 2=3500, 3=1500}
.
Any ideas?
There's an easy way to show a random record in a table: SELECT * FROM table_name ORDER BY RANDOM() LIMIT 1; But this query might take a while to finish as it reads the whole table first then take out a random record from it.
The PostgreSQL SELECT statement retrieves data from a single or several tables in a database, and returns the data in a result table, called a result-set. Use the SELECT statement to return one or more rows matching the specified criteria from the database tables.
This should do the trick:
WITH CTE AS ( SELECT random() * (SELECT SUM(percent) FROM YOUR_TABLE) R ) SELECT * FROM ( SELECT id, SUM(percent) OVER (ORDER BY id) S, R FROM YOUR_TABLE CROSS JOIN CTE ) Q WHERE S >= R ORDER BY id LIMIT 1;
The sub-query Q
gives the following result:
1 50 2 85 3 100
We then simply generate a random number in range [0, 100) and pick the first row that is at or beyond that number (the WHERE
clause). We use common table expression (WITH
) to ensure the random number is calculated only once.
BTW, the SELECT SUM(percent) FROM YOUR_TABLE
allows you to have any weights in percent
- they don't strictly need to be percentages (i.e. add-up to 100).
[SQL Fiddle]
ORDER BY random() ^ (1.0 / p)
from the algorithm described by Efraimidis and Spirakis.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With