Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it a bad idea to store row count and number of row to speed up pagination?

My website has more than 20.000.000 entries, entries have categories (FK) and tags (M2M). As for query even like SELECT id FROM table ORDER BY id LIMIT 1000000, 10 MySQL needs to scan 1000010 rows, but that is really unacceptably slow (and pks, indexes, joins etc etc don't help much here, still 1000010 rows). So I am trying to speed up pagination by storing row count and row number with triggers like this:

DELIMITER //
CREATE TRIGGER @trigger_name
AFTER INSERT
ON entry_table FOR EACH ROW
BEGIN
    UPDATE category_table SET row_count = (@rc := row_count + 1)
    WHERE id = NEW.category_id;
    NEW.row_number_in_category = @rc;
END //

And then I can simply:

SELECT * 
FROM entry_table 
WHERE row_number_in_category > 10 
ORDER BY row_number_in_category 
LIMIT 10

(now only 10 rows scanned and therefore selects are blazing fast, although inserts are slower, but they are rare comparing to selects, so it is ok)

Is it a bad approach and are there any good alternatives?

like image 329
Bob Avatar asked Nov 09 '22 03:11

Bob


1 Answers

Although I like the solution in the question. It may present some issues if data in the entry_table is changed - perhaps deleted or assigned to different categories over time.

It also limits the ways in which the data can be sorted, the method assumes that data is only sorted by the insert order. Covering multiple sort methods requires additional triggers and summary data.

One alternate way of paginating is to pass in offset of the field you are sorting/paginating by instead of an offset to the limit parameter.

Instead of this:

SELECT id FROM table ORDER BY id LIMIT 1000000, 10

Do this - assuming in this scenario that the last result viewed had an id of 1000000.

SELECT id FROM table WHERE id > 1000000 ORDER BY id LIMIT 0, 10

By tracking the offset of the pagination, this can be passed to subsequent queries for data and avoids the database sorting rows that are not ever going to be part of the end result.

If you really only wanted 10 rows out of 20million, you could go further and guess that the next 10 matching rows will occur in the next 1000 overall results. Perhaps with some logic to repeat the query with a larger allowance if this is not the case.

SELECT id FROM table WHERE id BETWEEN 1000000 AND 1001000 ORDER BY id LIMIT 0, 10

This should be significantly faster because the sort will probably be able to limit the result in a single pass.

like image 82
Steve E. Avatar answered Nov 15 '22 05:11

Steve E.