On a simple but very large Innodb table, I have a unique index on column A and I want to get a list of (integer) column B in order of (integer) column A
Very simple query, I am paging through millions of records.
SELECT B FROM hugeTable ORDER BY A LIMIT 10000 OFFSET 500000
This takes 10 seconds per query on a very fast server?
Filesort: Yes Filesort_on_disk: Yes Merge_passes: 9
This makes no sense to me, why can it not use Index A ?
Explain shows simple, no possible keys and filesort.
The creating sort index state appears when a query with an ORDER BY or GROUP BY clause can't use an existing index to perform the operation. In this case, MySQL needs to perform a more expensive filesort operation. This operation is typically performed in memory if the result set isn't too large.
In MySQL, filesort is the catch-all algorithm for producing sorted results for ORDER-BY or GROUP-BY queries. MySQL has two algorithms for filesort, both the original and the modified algorithms are described in the user manual.
MySQL obtains an exclusive lock on the table so that it can very quickly load data into the table. There is very little overhead in the LOAD DATA process, just the bare minimum parsing is done to make it work.
This means that MySQL has some data stored on the disk (or in memory) which is yet to be read and sent over. It may be the table itself, an index, a temporary table, a sorted output etc.
If the values for column B are not available in the index pages, then MySQL will need to access pages in the underlying table. Also there no predicate that filters which rows are being considered, and that means MySQL is seeing that ALL rows need to be returned. This could explain why the index is not being used.
Also note that the LIMIT
operations are processed at the end of the statement, as nearly the last step in the execution plan, with a some exceptions.
8.2.1.3. Optimizing LIMIT Queries http://dev.mysql.com/doc/refman/5.5/en/limit-optimization.html
I suspect that your query could make use of a covering index, for example "ON hugetable (A,B)
", to avoid the sort operation.
Absent a covering index, you could try rewriting the query something like this, to see if this will make use of the index on column A, and avoid a sort operation on millions of rows (to get the first 510,000 rows returned in order):
SELECT i.B
FROM ( SELECT j.A
FROM hugeTable j
ORDER
BY j.A
LIMIT 10000 OFFSET 500000
) k
JOIN hugetable i
ON i.A = k.A
ORDER
BY k.A
I suggest you do an EXPLAIN
on just the inline view query (aliased as k), and see if it shows "Using index
."
The outer query is likely to still have a "Using filesort
" operation, but at least that will be on only 10,000 rows.
(NOTE: You may want to try an "ORDER BY i.A
" in place of "k.A
" on the outer query, and see if that makes a difference.)
ADDENDUM
Not specifically addressing your question, but in terms of performance of that query, if this is "paging through" a set of rows, another option to consider, to get to the "next" page is to use the value of "A
" from the last row retrieved on the previous query as a "starting point" for the next row.
The original query looks like it's getting "page 51" (10,000 rows per page, page 51 would be rows 510,001 thru 520,000).
If you were to also return the value of 'A', and keep that for the last row. To get the "next" page, the query could actually be:
SELECT i.B, k.A
FROM ( SELECT j.A
FROM hugeTable j
WHERE j.A > $value_of_A_from_row_520000
-- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ORDER BY j.A ASC
LIMIT 10000
) k
JOIN hugetable i
ON i.A = k.A
ORDER
BY k.A
If you also kept the value for A from the "first" row, you could use that for backing up a page. That would really only work for forward one page, or back one page. Jumping to a different page, would have to use the original form of the query, counting rows.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With