I have API that returns dome paginated rows from DB. It works, however when I order rows by RANDOM()
I get duplicates on consecutive pages. Is there any option to set random seed per query?
If not is it possible to set random SEED globally to force RANDOM()
to generate same values per query? Then I could just change global random every 3 minutes or something like that...
U use this code:
SELECT * FROM "table" ORDER BY RANDOM() OFFSET 5 LIMIT 5
Now I want pass seed to this query so I can paginate random results. I should do this like this?:
SELECT "table".*, SETSEED(0.1) FROM "table" ORDER BY RANDOM() OFFSET 5 LIMIT 5
SELECT "table".*, SETSEED(0.1) FROM "table" ORDER BY RANDOM() OFFSET 10 LIMIT 5
And results will be correctly paginated?
(Update: see my other answer for a more flexible and randomizable solution.)
You say "random" order, which is what you get when calling ORDER BY random()
- for each row, PostgreSQL calls random()
, gets a value, and uses that to decide how to sort that row within the set of results.
To make this repeatable, you have to mess with seeds. This feels icky. According to the docs:
the effects will persist until the end of the session, unless overridden by another SET
I think this means that when using a connection pool, setseed
mutates the connection for the next process that uses that connection.
I have a case where I don't need true randomness. My criteria are:
Eg, this would be fine:
To get something like this, modulo seems to work well. Eg, ORDER BY id % 7, id
for all pages of request 1, and ORDER BY id % 11, id
for all pages of request 2. That is, for each row, divide its id by the modulus and sort by the remainder. Within rows with the same remainder, sort by id (to ensure the sort is stable).
The modulus could be picked randomly for the first page, then reused as a parameter for each subsequent page request.
You can see how this might work for your database like this:
echo "select id, id % 7 FROM my_table ORDER BY id % 77, id" | psql my_db > sort.txt
A prime modulus will probably give you the most variation. And if your ids start at 1 (such that % 77
would make the first 77 rows return in the normal order), you could try doing a modulus on a timestamp field instead. Eg:
ORDER BY (extract(epoch from inserted_at)* 100000)::bigint % 77
But you'd need a function index to make that performant.
With this union all
technique the random order is repeatable
select a, b
from (
select setseed(0.1), null as a, null as b
union all
select null, a, b
from t
offset 1
) s
order by random()
offset 0
limit 5
;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With