A common solution to high performance pagination is to use an indexed field, starting each new "page" from the last value of the prior page. For example, with a dataset like this (assuming Category and ID are the primary key):
Category | ID | Name
Red | 10 | Bob Jones
Red | 14 | Sam Smith
Red | 16 | Jill White
Blue | 10 | Mike Green
Blue | 16 | Mary Brown
Assuming a (rather small) page size of 1, if we want to return all of the Red
category records (assume ORDER BY Category, ID):
SELECT * FROM table WHERE Category='Red' AND ID>'00' (1st page, returns Bob Jones)
SELECT * FROM table WHERE Category='Red' AND ID>'10' (2nd page, returns Sam Smith)
SELECT * FROM table WHERE Category='Red' AND ID>'14' (3rd page, returns Jill White)
This works because by pagination "keyset" is using only the ID field (and it would also work on multiple fields if ID were globally unique, which it is not).
But if I want to return all the Red and Blue records (assuming that the table also contains other Categories), still one page at a time (assume ORDER BY Category, ID):
SELECT * FROM table WHERE Category IN ['Red', 'Blue'] AND Category>'' AND ID>'00' (1st page, returns Bob Jones)
SELECT * FROM table WHERE Category IN ['Red', 'Blue'] AND Category>'Red' AND ID>'10' (2nd page, returns Sam Smith, but skips Mike Green)
In PostgreSQL and some others, there is a "row values" predicate syntax that supports this (assume ORDER BY Category, ID):
SELECT * FROM table WHERE (Category, ID) > ('', '00') (1st page, returns Bob Jones)
SELECT * FROM table WHERE (Category, ID) > ('Red', '10') (2nd page, returns Sam Smith)
It works because both Category and ID are treated as a single compound value for the purpose of the test. But I'm not using PostgreSQL or a database that supports "row values". So the question is if there is an alternative solution that would work for this (whether there are 2 or n fields)? For it to work for pagination on multiple variable fields, I need to device a predicate that will always find the "next record" in the multi-field sort order.
PS: OFFSET/LIMIT or SKIP/LIMIT pagination works of course, but neither is efficient on large data sets, which is why I'm trying to use "keyset" pagination.
Keyset pagination (also known as the "seek method") is used to fetch a subset of records from a table quickly. It does this by restricting the set of records returned with a combination of WHERE and LIMIT clauses.
The seek method is based on filtering out the data from the previous pages. We do this by having the client send the ID of the last record listed. We take that ID and place it in the WHERE clause providing us with only relevant data.
You can always phrase the predicate:
(x, y) > (a, b)
as:
x >= a and (x = a and y > b or x > a)
Note the first preficate x >= a
promotes (it doesn't ensure) the usage of an index on that column. That is, it becomes an "access predicate". The second one x = a and y > b or x > a
filters out the rows in excess, effectively becoming a "filtering predicate".
This way of phrasing "tuple inequality" predicates promotes the usage of indexes. However, they become increasingly complex if you are comparing 3, 4, or more columns.
Expanding on The Impaler's answer, the generic syntax for keyset pagination with composite keys is the following:
WHERE
(x > a) OR
(x = a AND y > b) OR
(x = a AND y = b AND z > c) OR
...
This is not as nice as (x, y, z) > (a, b, c)
, but you can generate the SQL in your language of choice. You iterate through the set of composite fields and expand each successive field to include {field} = {value} AND
of the previous fields.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With