Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Repeatable pagination with randomly ordered rows

I have API that returns dome paginated rows from DB. It works, however when I order rows by RANDOM() I get duplicates on consecutive pages. Is there any option to set random seed per query?

If not is it possible to set random SEED globally to force RANDOM() to generate same values per query? Then I could just change global random every 3 minutes or something like that...


U use this code:

SELECT * FROM "table" ORDER BY RANDOM() OFFSET 5 LIMIT 5

Now I want pass seed to this query so I can paginate random results. I should do this like this?:

SELECT "table".*, SETSEED(0.1) FROM "table" ORDER BY RANDOM() OFFSET 5 LIMIT 5
SELECT "table".*, SETSEED(0.1) FROM "table" ORDER BY RANDOM() OFFSET 10 LIMIT 5

And results will be correctly paginated?

like image 998
user606521 Avatar asked Aug 25 '14 09:08

user606521


2 Answers

If the order needs to be "shuffled" but not truly random...

(Update: see my other answer for a more flexible and randomizable solution.)

You say "random" order, which is what you get when calling ORDER BY random() - for each row, PostgreSQL calls random(), gets a value, and uses that to decide how to sort that row within the set of results.

To make this repeatable, you have to mess with seeds. This feels icky. According to the docs:

the effects will persist until the end of the session, unless overridden by another SET

I think this means that when using a connection pool, setseed mutates the connection for the next process that uses that connection.

What about modulo?

I have a case where I don't need true randomness. My criteria are:

  • not the same order every time
  • predictable order within pages of the same result set, so that we don't get duplicates on subsequent pages

Eg, this would be fine:

  • Listing 1
    • page 1: items 1, 4
    • page 2: items 3, 2
  • Listing 2 (different user, or same user coming back later)
    • page 1: items 3, 1
    • page 2: items 2, 4

To get something like this, modulo seems to work well. Eg, ORDER BY id % 7, id for all pages of request 1, and ORDER BY id % 11, id for all pages of request 2. That is, for each row, divide its id by the modulus and sort by the remainder. Within rows with the same remainder, sort by id (to ensure the sort is stable).

The modulus could be picked randomly for the first page, then reused as a parameter for each subsequent page request.

You can see how this might work for your database like this:

echo "select id, id % 7 FROM my_table ORDER BY id % 77, id" | psql my_db > sort.txt

A prime modulus will probably give you the most variation. And if your ids start at 1 (such that % 77 would make the first 77 rows return in the normal order), you could try doing a modulus on a timestamp field instead. Eg:

ORDER BY (extract(epoch from inserted_at)* 100000)::bigint % 77

But you'd need a function index to make that performant.

like image 114
Nathan Long Avatar answered Oct 14 '22 06:10

Nathan Long


With this union all technique the random order is repeatable

select a, b
from (
    select setseed(0.1), null as a, null as b

    union all

    select null, a, b
    from t

    offset 1
) s
order by random()
offset 0
limit 5
;
like image 36
Clodoaldo Neto Avatar answered Oct 14 '22 06:10

Clodoaldo Neto