Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL paging large data based on a specific order

Good Morning,

I have a table that contains couple million rows and I need to view the data ordered by the TimeStamp.

when I tried to do this

SELECT * FROM table ORDER BY date DESC offset 0 LIMIT 200

the MySQL will order all the data and then will response with the 200 rows and this is a performance issue. because its not wise to order everything each time I want to scroll the page !

do you have any idea on how we could improve the performance ?

like image 507
daigoor Avatar asked Oct 20 '22 23:10

daigoor


1 Answers

Firstly you need to create an index based on the date field. This allows the rows to be retrieved in order without having to sort the entire table every time a request is made.

Secondly, paging based on index gets slower the deeper you delve into the result set. To illustrate:

  • ORDER BY indexedcolumn LIMIT 0, 200 is very fast because it only has to scan 200 rows of the index.

  • ORDER BY indexedcolumn LIMIT 200, 200 is relatively fast, but requires scanning 400 rows of the index.

  • ORDER BY indexedcolumn LIMIT 660000, 200 is very slow because it requires scanning 660,200 rows of the index.

    Note: even so, this may still be significantly faster than not having an index at all.

You can fix this in a few different ways.

  1. Implement value-based paging, so you're paging based on the value of the last result on the previous page. For example:

    WHERE indexedcolumn>[lastval] ORDER BY indexedcolumn LIMIT 200 replacing [lastval] with the value of the last result of the current page. The index allows random access to a particular value, and proceeding forward or backwards from that value.

  2. Only allow users to view the first X rows (eg. 1000). This is no good if the value they want is the 2529th value.

  3. Think of some logical way of breaking up your large table, for example by the first letter, the year, etc so users never have to encounter the entire result set of millions of rows, instead they need to drill down into a specific subset first, which will be a smaller set and quicker to sort.

If you're combining a WHERE and an ORDER BY you'll need to reflect this in the design of your index to enable MySQL to continue to benefit from the index for sorting. For example if your query is:

SELECT * FROM mytable WHERE year='2012' ORDER BY date LIMIT 0, 200

Then your index will need to be on two columns (year, date) in that order.

If your query is:

SELECT * FROM mytable WHERE firstletter='P' ORDER BY date LIMIT 0, 200

Then your index will need to be on the two columns (firstletter, date) in that order.

The idea is that an index on multiple columns allows sorting by any column as long as you specified previous columns to be constants (single values) in a condition. So an index on A, B, C, D and E allows sorting by C if you specify A and B to be constants in a WHERE condition. A and B cannot be ranges.

like image 103
thomasrutter Avatar answered Oct 27 '22 19:10

thomasrutter