Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I speed up a MySQL query with a large offset in the LIMIT clause?

I'm getting performance problems when LIMITing a mysql SELECT with a large offset:

SELECT * FROM table LIMIT m, n;

If the offset m is, say, larger than 1,000,000, the operation is very slow.

I do have to use limit m, n; I can't use something like id > 1,000,000 limit n.

How can I optimize this statement for better performance?

like image 580
ZA. Avatar asked Aug 07 '09 10:08

ZA.


People also ask

Does limit 1 speed up query?

The answer, in short, is yes.If you limit your result to 1, then even if you are "expecting" one result, the query will be faster because your database wont look through all your records. It will simply stop once it finds a record that matches your query.

Is offset slow SQL?

Why OFFSET is so slow? Well, in most cases, low offset queries are not slow. The problem starts with high OFFSET values. If your query is using the following limit clause: "LIMIT 50000, 20", it's actually requesting the database to go through 50,020 rows and throw away the first 50,000.

What is offset in limit MySQL?

MySQL Offset is used to specify from which row we want the data to retrieve. To be precise, specify which row to start retrieving from. Offset is used along with the LIMIT. Here, LIMIT is nothing but to restrict the number of rows from the output.


5 Answers

Perhaps you could create an indexing table which provides a sequential key relating to the key in your target table. Then you can join this indexing table to your target table and use a where clause to more efficiently get the rows you want.

#create table to store sequences
CREATE TABLE seq (
   seq_no int not null auto_increment,
   id int not null,
   primary key(seq_no),
   unique(id)
);

#create the sequence
TRUNCATE seq;
INSERT INTO seq (id) SELECT id FROM mytable ORDER BY id;

#now get 1000 rows from offset 1000000
SELECT mytable.* 
FROM mytable 
INNER JOIN seq USING(id)
WHERE seq.seq_no BETWEEN 1000000 AND 1000999;
like image 118
Paul Dixon Avatar answered Oct 10 '22 18:10

Paul Dixon


If records are large, the slowness may be coming from loading the data. If the id column is indexed, then just selecting it will be much faster. You can then do a second query with an IN clause for the appropriate ids (or could formulate a WHERE clause using the min and max ids from the first query.)

slow:

SELECT * FROM table ORDER BY id DESC LIMIT 10 OFFSET 50000

fast:

SELECT id FROM table ORDER BY id DESC LIMIT 10 OFFSET 50000

SELECT * FROM table WHERE id IN (1,2,3...10)
like image 31
Scott Nelson Avatar answered Oct 10 '22 18:10

Scott Nelson


There's a blog post somewhere on the internet on how you should best make the selection of the rows to show should be as compact as possible, thus: just the ids; and producing the complete results should in turn fetch all the data you want for only the rows you selected.

Thus, the SQL might be something like (untested, I'm not sure it actually will do any good):

select A.* from table A 
  inner join (select id from table order by whatever limit m, n) B
  on A.id = B.id
order by A.whatever

If your SQL engine is too primitive to allow this kind of SQL statements, or it doesn't improve anything, against hope, it might be worthwhile to break this single statement into multiple statements and capture the ids into a data structure.

Update: I found the blog post I was talking about: it was Jeff Atwood's "All Abstractions Are Failed Abstractions" on Coding Horror.

like image 37
bart Avatar answered Oct 10 '22 20:10

bart


I don't think there's any need to create a separate index if your table already has one. If so, then you can order by this primary key and then use values of the key to step through:

SELECT * FROM myBigTable WHERE id > :OFFSET ORDER BY id ASC;

Another optimisation would be not to use SELECT * but just the ID so that it can simply read the index and doesn't have to then locate all the data (reduce IO overhead). If you need some of the other columns then perhaps you could add these to the index so that they are read with the primary key (which will most likely be held in memory and therefore not require a disc lookup) - although this will not be appropriate for all cases so you will have to have a play.

I wrote an article with more details:

http://www.4pmp.com/2010/02/scalable-mysql-avoid-offset-for-large-tables/

like image 34
SlappyTheFish Avatar answered Oct 10 '22 20:10

SlappyTheFish


Paul Dixon's answer is indeed a solution to the problem, but you'll have to maintain the sequence table and ensure that there is no row gaps.

If that's feasible, a better solution would be to simply ensure that the original table has no row gaps, and starts from id 1. Then grab the rows using the id for pagination.

SELECT * FROM table A WHERE id >= 1 AND id <= 1000;
SELECT * FROM table A WHERE id >= 1001 AND id <= 2000;

and so on...

like image 32
Jackson Leung Avatar answered Oct 10 '22 20:10

Jackson Leung