Example input: <pre class="prettyprint"> SELECT * FROM test; id | percent ----+---------- 1 | 50 2 | 35 3 | 15 (3 rows) </pre> How would you write such query, that on average 50% of time i could get the row with id=1, 35% of time row with id=2, and 15% of time row with id=3? I tried something like <code>SELECT id FROM test ORDER BY p * random() DESC LIMIT 1</code>, but it gives wrong results. After 10,000 runs I get a distribution like: <code>{1=6293, 2=3302, 3=405}</code>, but I expected the distribution to be nearly: <code>{1=5000, 2=3500, 3=1500}</code>. Any ideas?

<blockquote> ORDER BY random() ^ (1.0 / p) </blockquote> from the algorithm described by Efraimidis and Spirakis.

Select random row from a PostgreSQL table with weighted row probabilities

Tags:

Example input:

 SELECT * FROM test;  id | percent    ----+----------   1 | 50    2 | 35      3 | 15    (3 rows)

How would you write such query, that on average 50% of time i could get the row with id=1, 35% of time row with id=2, and 15% of time row with id=3?

I tried something like SELECT id FROM test ORDER BY p * random() DESC LIMIT 1, but it gives wrong results. After 10,000 runs I get a distribution like: {1=6293, 2=3302, 3=405}, but I expected the distribution to be nearly: {1=5000, 2=3500, 3=1500}.

Any ideas?

686

asked Oct 23 '12 22:10

Oleg Golovanov

2 Answers

This should do the trick:

WITH CTE AS (     SELECT random() * (SELECT SUM(percent) FROM YOUR_TABLE) R ) SELECT * FROM (     SELECT id, SUM(percent) OVER (ORDER BY id) S, R     FROM YOUR_TABLE CROSS JOIN CTE ) Q WHERE S >= R ORDER BY id LIMIT 1;

The sub-query Q gives the following result:

1  50 2  85 3  100

We then simply generate a random number in range [0, 100) and pick the first row that is at or beyond that number (the WHERE clause). We use common table expression (WITH) to ensure the random number is calculated only once.

BTW, the SELECT SUM(percent) FROM YOUR_TABLE allows you to have any weights in percent - they don't strictly need to be percentages (i.e. add-up to 100).

[SQL Fiddle]

answered Sep 20 '22 05:09

Branko Dimitrijevic

ORDER BY random() ^ (1.0 / p)

from the algorithm described by Efraimidis and Spirakis.

answered Sep 20 '22 05:09

Mechanic Wei

Related questions
                            
                                Configure AutoMapper to map to concrete types but allow Interfaces in the definition of my class
                            
                                Use WebClient with socks proxy
                            
                                How to obtain native logger in Selenium WebDriver
                            
                                Relative parent, absolute positioning vertically by percentage?
                            
                                adding a ListBoxItem in a ListBox in C#?
                            
                                Fastest way to get sorted unique list in python?
                            
                                How can I pass reference to call_user_func?
                            
                                Dollar sign in regular expression and new line character
                            
                                How do I resolve HTTP Error 404.8?
                            
                                LR(1) Item DFA - Computing Lookaheads
                            
                                Javascript memory and leak problems
                            
                                Why does foreach fail to find my GetEnumerator extension method?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With