Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generic SQL predicate to use for keyset pagination on multiple fields

A common solution to high performance pagination is to use an indexed field, starting each new "page" from the last value of the prior page. For example, with a dataset like this (assuming Category and ID are the primary key):

Category | ID | Name
Red      | 10 | Bob Jones
Red      | 14 | Sam Smith
Red      | 16 | Jill White
Blue     | 10 | Mike Green
Blue     | 16 | Mary Brown

Assuming a (rather small) page size of 1, if we want to return all of the Red category records (assume ORDER BY Category, ID):

SELECT * FROM table WHERE Category='Red' AND ID>'00' (1st page, returns Bob Jones)
SELECT * FROM table WHERE Category='Red' AND ID>'10' (2nd page, returns Sam Smith)
SELECT * FROM table WHERE Category='Red' AND ID>'14' (3rd page, returns Jill White)

This works because by pagination "keyset" is using only the ID field (and it would also work on multiple fields if ID were globally unique, which it is not).

But if I want to return all the Red and Blue records (assuming that the table also contains other Categories), still one page at a time (assume ORDER BY Category, ID):

SELECT * FROM table WHERE Category IN ['Red', 'Blue'] AND Category>'' AND ID>'00' (1st page, returns Bob Jones)
SELECT * FROM table WHERE Category IN ['Red', 'Blue'] AND Category>'Red' AND ID>'10' (2nd page, returns Sam Smith, but skips Mike Green)

In PostgreSQL and some others, there is a "row values" predicate syntax that supports this (assume ORDER BY Category, ID):

SELECT * FROM table WHERE (Category, ID) > ('', '00') (1st page, returns Bob Jones)
SELECT * FROM table WHERE (Category, ID) > ('Red', '10') (2nd page, returns Sam Smith)

It works because both Category and ID are treated as a single compound value for the purpose of the test. But I'm not using PostgreSQL or a database that supports "row values". So the question is if there is an alternative solution that would work for this (whether there are 2 or n fields)? For it to work for pagination on multiple variable fields, I need to device a predicate that will always find the "next record" in the multi-field sort order.

PS: OFFSET/LIMIT or SKIP/LIMIT pagination works of course, but neither is efficient on large data sets, which is why I'm trying to use "keyset" pagination.

like image 749
eric Avatar asked Jun 22 '19 22:06

eric


People also ask

How does Keyset pagination work?

Keyset pagination (also known as the "seek method") is used to fetch a subset of records from a table quickly. It does this by restricting the set of records returned with a combination of WHERE and LIMIT clauses.

What is seek pagination?

The seek method is based on filtering out the data from the previous pages. We do this by having the client send the ID of the last record listed. We take that ID and place it in the WHERE clause providing us with only relevant data.


2 Answers

You can always phrase the predicate:

(x, y) > (a, b)

as:

x >= a and (x = a and y > b or x > a)

Note the first preficate x >= a promotes (it doesn't ensure) the usage of an index on that column. That is, it becomes an "access predicate". The second one x = a and y > b or x > a filters out the rows in excess, effectively becoming a "filtering predicate".

This way of phrasing "tuple inequality" predicates promotes the usage of indexes. However, they become increasingly complex if you are comparing 3, 4, or more columns.

like image 92
The Impaler Avatar answered Nov 06 '22 01:11

The Impaler


Expanding on The Impaler's answer, the generic syntax for keyset pagination with composite keys is the following:

WHERE
  (x > a) OR
  (x = a AND y > b) OR
  (x = a AND y = b AND z > c) OR
  ...

This is not as nice as (x, y, z) > (a, b, c), but you can generate the SQL in your language of choice. You iterate through the set of composite fields and expand each successive field to include {field} = {value} AND of the previous fields.

like image 34
Jake Z Avatar answered Nov 06 '22 02:11

Jake Z