Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pagination and filtering on a very large table in postgresql (keyset pagination?)

I have a scientific database with currently 4,300,000 records. It's a scientific database, and an API is feeding it. In june 2020, I will probably have about 100,000,000 records.

This is de layout of the table 'output':

ID   | sensor_ID    | speed    | velocity | direction
-----------------------------------------------------
1    | 1            | 10       | 1        | up
2    | 2            | 12       | 2        | up
3    | 2            | 11.5     | 1.5      | down
4    | 1            | 9.5      | 0.8      | down
5    | 3            | 11       | 0.75     | up
...

BTW, this is dummy data. But output is a table with 5 columns: ID, sensor_ID, speed, velocity and direction.

What I want to achieve is a decent pagination and filter method. I want to create a website (in nodejs) where this +4,000,000 records (for now) will be displayed, 10,000 records per page. I also want to be able to filter on sensor_ID, speed, velocity or direction.

For now, I have this query for selecting specific rows:

SELECT * FROM output ORDER BY ID DESC OFFSET 0 LIMIT 10000 // first 10,000 rows

SELECT * FROM output ORDER BY ID DESC OFFSET 10000 LIMIT 10000 // next 10,000 rows

...

I'm searching for some information/tips about creating a decent pagination method. For now, it's still quiet fast the way I do it, but I think it will be a lot slower when we hit +50,000,000 records.

First of all, I found this page: https://www.citusdata.com/blog/2016/03/30/five-ways-to-paginate/. I'm interested in the keyset pagination. But to be honest, I have no clue how to start.

What I think I must do:

Create an index on the ID-field:

CREATE UNIQUE INDEX index_id ON output USING btree (ID)

I also found this page: https://leopard.in.ua/2014/10/11/postgresql-paginattion. When you scroll down to "Improvement #2: The Seek Method", you can see that they dropped the OFFSET-clause, and are using a WHERE-clause. I also see that they are using the last insert ID in their query:

SELECT * FROM output WHERE ID < <last_insert_id_here> ORDER BY ID DESC LIMIT 10000

I do not fully understand this. For the first page, I need the very last insert ID. Then I fetch the 10,000 newest records. But after that, to get the second page, I don't need the very last insert ID, I need the 10,000th last insert ID (I guess).

Can someone give me a good explanation about pagination and filtering in a fast way.

The stuff I'm using: - postgresql - pgadmin (for database management) - node.js (latest version)

Thanks everyone! And have a nice 2020!

EDIT 1: I have no clue, but could massiveJS (https://massivejs.org/) be something good to use? And should I use it on ALL queries, or only on the pagination queries?

EDIT 2: I THINK I got it figured out a little bit (correct me if I'm wrong).

Let's say I have 100,000 records:

1) Get the last inserted ID

2) Use this last inserted ID to fetch the last 10,000 records

SELECT * FROM output WHERE ID < 100000 ORDER BY ID DESC LIMIT 10000 // last insert ID is here 100,000 because I have 100,000 records

3) Show the 10,000 records but also save the insert ID of the 10,000the record to use in the next query

4) Get the next 10,000 records with the new last insert id

SELECT * FROM output WHERE ID < 90000 ORDER BY ID DESC LIMIT 10000 // 90,000 is the very last insert id - 10,000

5) ...

Is this correct?

like image 374
Sam Leurs Avatar asked Jan 06 '20 22:01

Sam Leurs


2 Answers

Here's how I handle this. For the first page I fetch, I use

SELECT id, col, col, col 
  FROM output 
 ORDER BY id DESC
 LIMIT 10000

Then, in my client program (node.js) I capture the id value from the last row of the result set. When I need the next page, I do this.

 SELECT id, col, col, col
   FROM output
  WHERE id < my_captured_id_value
  ORDER BY id DESC

This exploits the index. And it works correctly even if you have deleted some rows from the table.

By the way, you probably want a descending index if your first pagination page has the largest ids. CREATE UNIQUE INDEX index_id ON output USING btree (ID DESC).

Pro tip SELECT * is harmful to performance on large databases. Always list the columns you actually need.

like image 195
O. Jones Avatar answered Sep 22 '22 20:09

O. Jones


In keyset pagination you should set WHERE clause on what you want to set in ORDER BY clause , and for DESC you should use < and vice versa. For the first page you can use something like this:

SELECT Col1, Col2, Col3 
FROM db.tbl 
WHERE Col3 LIKE '%search_term%'
ORDER BY Col1 DESC , Col2 ASC
LIMIT 10000

and for next page , you should send value of Col1 and Col2 from last row of the result to the query like this:

SELECT Col1, Col2, Col3 
FROM db.tbl 
WHERE Col3 LIKE '%search_term%'
AND ( Col1 < Col1_last_row_value AND Col2 > Col2_last_row_value)
ORDER BY Col1 DESC , Col2 ASC
LIMIT 10000

and in the server or client side you should check if your query bring back any result or not , if not that means you are done and the loading-icon of "infinite scroll" has to be hidden

like image 30
ali tekrar Avatar answered Sep 25 '22 20:09

ali tekrar