Optimize SQL that uses between clause

Question

Consider the following 2 tables:

Table A:
id
event_time

Table B
id
start_time
end_time

Every record in table A is mapped to exactly 1 record in table B. This means table B has no overlapping periods. Many records from table A can be mapped to the same record in table B.

I need a query that returns all A.id, B.id pairs. Something like:

SELECT A.id, B.id 
FROM A, B 
WHERE A.event_time BETWEEN B.start_time AND B.end_time

I am using MySQL and I cannot optimize this query. With ~980 records in table A and 130.000 in table B this takes forever. I understand this has to perform 980 queries, but taking more than 15 minutes on a beefy machine is strange. Any suggestions?

P.S. I cannot change the database schema, but I can add indexes. However an index (with 1 or 2 fields) on the time fields doesn't help.

Kibbee · Accepted Answer

You may want to try something like this

Select A.ID,
(SELECT B.ID FROM B
WHERE A.EventTime BETWEEN B.start_time AND B.end_time LIMIT 1) AS B_ID
FROM A

If you have an index on the Start_Time,End_Time fields for B, then this should work quite well.

Bill Karwin · Answer

I'm not sure this can be optimized fully. I tried it on MySQL 5.1.30. I also added an index on {B.start_time, B.end_time} as suggested by other folks. Then I got a report from EXPLAIN, but the best I could get is a Range Access Method:

EXPLAIN SELECT A.id, B.id FROM A JOIN B 
ON A.event_time BETWEEN B.start_time AND B.end_time;

+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                                          |
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------------------------+
|  1 | SIMPLE      | A     | ALL  | event_time    | NULL | NULL    | NULL |    8 |                                                | 
|  1 | SIMPLE      | B     | ALL  | start_time    | NULL | NULL    | NULL |   96 | Range checked for each record (index map: 0x4) | 
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------------------------+

See the note on the far right. The optimizer thinks it might be able to use the index on {B.start_time, B.end_time} but it ended up deciding not to use that index. Your results may vary, because your data distribution is more representative.

Compare with the index usage if you compare A.event_time to a constant range:

EXPLAIN SELECT A.id FROM A
WHERE A.event_time BETWEEN '2009-02-17 09:00' and '2009-02-17 10:00';

+----+-------------+-------+-------+---------------+------------+---------+------+------+-------------+
| id | select_type | table | type  | possible_keys | key        | key_len | ref  | rows | Extra       |
+----+-------------+-------+-------+---------------+------------+---------+------+------+-------------+
|  1 | SIMPLE      | A     | range | event_time    | event_time | 8       | NULL |    1 | Using where | 
+----+-------------+-------+-------+---------------+------------+---------+------+------+-------------+

And compare with the dependent sub-query form given by @Luke and @Kibbee, which seems to make use of indexes more effectively:

EXPLAIN SELECT A.id AS id_from_a,
    (
        SELECT B.id
        FROM B
        WHERE A.id BETWEEN B.start_time AND B.end_time
        LIMIT 0, 1
    ) AS id_from_b
FROM A;

+----+--------------------+-------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type        | table | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
+----+--------------------+-------+-------+---------------+---------+---------+------+------+-------------+
|  1 | PRIMARY            | A     | index | NULL          | PRIMARY | 8       | NULL |    8 | Using index | 
|  2 | DEPENDENT SUBQUERY | B     | ALL   | start_time    | NULL    | NULL    | NULL |  384 | Using where | 
+----+--------------------+-------+-------+---------------+---------+---------+------+------+-------------+

Weirdly, EXPLAIN lists possible_keys as NULL (i.e. no indexes could be used) but then decides to use the primary key after all. Could be an idiosyncrasy of MySQL's EXPLAIN report?

LukeH · Answer

I wouldn't normally recommend a query like this, but...

Since you've specified that table A only has about 980 rows and that each row maps to exactly one row in table B, then you could do the following and it will most likely be a lot faster than a cartesian join:

SELECT A.id AS id_from_a,
    (
        SELECT B.id
        FROM B
        WHERE A.event_time BETWEEN B.start_time AND B.end_time
        LIMIT 0, 1
    ) AS id_from_b
FROM A

Optimize SQL that uses between clause

Tags:

sql

mysql

query-optimization

daremon

3 Answers

Kibbee

Bill Karwin

LukeH

Recent Activity

Donate For Us

Optimize SQL that uses between clause

Tags:

sql

mysql

query-optimization

daremon

3 Answers

Kibbee

Bill Karwin

LukeH

Related questions

Recent Activity

Donate For Us