Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A SQL query searching for rows that satisfy Column1 <= X <= Column2 is very slow

I am using a MySQL DB, and have the following table:

CREATE TABLE SomeTable (
  PrimaryKeyCol BIGINT(20) NOT NULL,
  A BIGINT(20) NOT NULL,
  FirstX INT(11) NOT NULL,
  LastX INT(11) NOT NULL,
  P INT(11) NOT NULL,
  Y INT(11) NOT NULL,
  Z INT(11) NOT NULL,
  B BIGINT(20) DEFAULT NULL,
  PRIMARY KEY (PrimaryKeyCol),
  UNIQUE KEY FirstLastXPriority_Index (FirstX,LastX,P)
) ENGINE=InnoDB;

The table contains 4.3 million rows, and never changes once initialized.

The important columns of this table are FirstX, LastX, Y, Z and P.

As you can see, I have a unique index on the rows FirstX, LastX and P.

The columns FirstX and LastX define a range of integers.

The query I need to run on this table fetches for a given X all the rows having FirstX <= X <= LastX (i.e. all the rows whose range contains the input number X).

For example, if the table contains the rows (I'm including only the relevant columns):

FirstX LastX P Y Z
100000 500000 1 111 222
150000 220000 2 333 444
180000 190000 3 555 666
550000 660000 4 777 888
700000 900000 5 999 111
750000 850000 6 222 333

and I need, for example, the rows that contain the value 185000, the first 3 rows should be returned.

The query I tried, which should be using the index, is:

SELECT P, Y, Z FROM SomeTable WHERE FirstX <= ? AND LastX >= ? LIMIT 10;

Even without the LIMIT, this query should return a small number of records (less than 50) for any given X.

This query was executed by a Java application for 120000 values of X. To my surprise, it took over 10 hours (!) and the average time per query was 0.3 seconds.

This is not acceptable, not even near acceptable. It should be much faster.

I examined a single query that took 0.563 seconds to make sure the index was being used. The query I tried (the same as the query above with a specific integer value instead of ?) returned 2 rows.

I used EXPLAIN to find out what was happening:

id               1
select_type      SIMPLE
table            SomeTable 
type             range
possible_keys    FirstLastXPriority_Index
key              FirstLastXPriority_Index 
key_len          4
ref              NULL
rows             2104820
Extra            Using index condition

As you can see, the execution involved 2104820 rows (nearly 50% of the rows of the table), even though only 2 rows satisfy the conditions, so half of the index is examined in order to return just 2 rows.

Is there something wrong with the query or the index? Can you suggest an improvement to the query or the index?

EDIT:

Some answers suggested that I run the query in batches for multiple values of X. I can't do that, since I run this query in real time, as inputs arrive to my application. Each time an input X arrives, I must execute the query for X and perform some processing on the output of the query.

like image 839
Eran Avatar asked Dec 13 '17 17:12

Eran


People also ask

Why is MySQL query so slow?

Queries can become slow for various reasons ranging from improper index usage to bugs in the storage engine itself. However, in most cases, queries become slow because developers or MySQL database administrators neglect to monitor them and keep an eye on their performance.

How do I make a SQL query slow?

Just make a really bad query off-index or force a normally-good query to use LOOP JOINs when it should be using HASH/MERGE ;-) A few self cross joins will slow things up nicely... And a large result set would cause IO to be the bottleneck.


1 Answers

I found a solution that relies on properties of the data in the table. I would rather have a more general solution that doesn't depend on the current data, but for the time being that's the best I have.

The problem with the original query:

SELECT P, Y, Z FROM SomeTable WHERE FirstX <= ? AND LastX >= ? LIMIT 10;

is that the execution may require scanning a large percentage of the entries in the FirstX,LastX,P index when the first condition FirstX <= ? is satisfied by a large percentage of the rows.

What I did to reduce the execution time is observe that LastX-FirstX is relatively small.

I ran the query:

SELECT MAX(LastX-FirstX) FROM SomeTable;

and got 4200000.

This means that FirstX >= LastX – 4200000 for all the rows in the table.

So in order to satisfy LastX >= ?, we must also satisfy FirstX >= ? – 4200000.

So we can add a condition to the query as follows:

SELECT P, Y, Z FROM SomeTable WHERE FirstX <= ? AND FirstX >= ? - 4200000 AND LastX >= ? LIMIT 10;

In the example I tested in the question, the number of index entries processed was reduced from 2104820 to 18 and the running time was reduced from 0.563 seconds to 0.0003 seconds.

I tested the new query with the same 120000 values of X. The output was identical to the old query. The time went down from over 10 hours to 5.5 minutes, which is over 100 times faster.

like image 166
Eran Avatar answered Oct 20 '22 20:10

Eran