I am using a MySQL DB, and have the following table:
CREATE TABLE SomeTable (
PrimaryKeyCol BIGINT(20) NOT NULL,
A BIGINT(20) NOT NULL,
FirstX INT(11) NOT NULL,
LastX INT(11) NOT NULL,
P INT(11) NOT NULL,
Y INT(11) NOT NULL,
Z INT(11) NOT NULL,
B BIGINT(20) DEFAULT NULL,
PRIMARY KEY (PrimaryKeyCol),
UNIQUE KEY FirstLastXPriority_Index (FirstX,LastX,P)
) ENGINE=InnoDB;
The table contains 4.3 million rows, and never changes once initialized.
The important columns of this table are FirstX
, LastX
, Y
, Z
and P
.
As you can see, I have a unique index on the rows FirstX
, LastX
and P
.
The columns FirstX
and LastX
define a range of integers.
The query I need to run on this table fetches for a given X all the rows having FirstX <= X <= LastX (i.e. all the rows whose range contains the input number X).
For example, if the table contains the rows (I'm including only the relevant columns):
FirstX | LastX | P | Y | Z |
---|---|---|---|---|
100000 | 500000 | 1 | 111 | 222 |
150000 | 220000 | 2 | 333 | 444 |
180000 | 190000 | 3 | 555 | 666 |
550000 | 660000 | 4 | 777 | 888 |
700000 | 900000 | 5 | 999 | 111 |
750000 | 850000 | 6 | 222 | 333 |
and I need, for example, the rows that contain the value 185000
, the first 3
rows should be returned.
The query I tried, which should be using the index, is:
SELECT P, Y, Z FROM SomeTable WHERE FirstX <= ? AND LastX >= ? LIMIT 10;
Even without the LIMIT, this query should return a small number of records (less than 50
) for any given X.
This query was executed by a Java application for 120000
values of X. To my surprise, it took over 10 hours (!) and the average time per query was 0.3 seconds.
This is not acceptable, not even near acceptable. It should be much faster.
I examined a single query that took 0.563 seconds to make sure the index was being used. The query I tried (the same as the query above with a specific integer value instead of ?
) returned 2 rows.
I used EXPLAIN
to find out what was happening:
id 1
select_type SIMPLE
table SomeTable
type range
possible_keys FirstLastXPriority_Index
key FirstLastXPriority_Index
key_len 4
ref NULL
rows 2104820
Extra Using index condition
As you can see, the execution involved 2104820
rows (nearly 50% of the rows of the table), even though only 2 rows satisfy the conditions, so half of the index is examined in order to return just 2 rows.
Is there something wrong with the query or the index? Can you suggest an improvement to the query or the index?
EDIT:
Some answers suggested that I run the query in batches for multiple values of X. I can't do that, since I run this query in real time, as inputs arrive to my application. Each time an input X arrives, I must execute the query for X and perform some processing on the output of the query.
Queries can become slow for various reasons ranging from improper index usage to bugs in the storage engine itself. However, in most cases, queries become slow because developers or MySQL database administrators neglect to monitor them and keep an eye on their performance.
Just make a really bad query off-index or force a normally-good query to use LOOP JOINs when it should be using HASH/MERGE ;-) A few self cross joins will slow things up nicely... And a large result set would cause IO to be the bottleneck.
I found a solution that relies on properties of the data in the table. I would rather have a more general solution that doesn't depend on the current data, but for the time being that's the best I have.
The problem with the original query:
SELECT P, Y, Z FROM SomeTable WHERE FirstX <= ? AND LastX >= ? LIMIT 10;
is that the execution may require scanning a large percentage of the entries in the FirstX
,LastX
,P
index when the first condition FirstX <= ?
is satisfied by a large percentage of the rows.
What I did to reduce the execution time is observe that LastX-FirstX
is relatively small.
I ran the query:
SELECT MAX(LastX-FirstX) FROM SomeTable;
and got 4200000
.
This means that FirstX >= LastX – 4200000
for all the rows in the table.
So in order to satisfy LastX >= ?
, we must also satisfy FirstX >= ? – 4200000
.
So we can add a condition to the query as follows:
SELECT P, Y, Z FROM SomeTable WHERE FirstX <= ? AND FirstX >= ? - 4200000 AND LastX >= ? LIMIT 10;
In the example I tested in the question, the number of index entries processed was reduced from 2104820
to 18
and the running time was reduced from 0.563 seconds to 0.0003 seconds.
I tested the new query with the same 120000
values of X
. The output was identical to the old query. The time went down from over 10 hours to 5.5 minutes, which is over 100 times faster.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With