Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Priority/Cascading datetime without full table scan

SELECT IF(priority_date, priority_date, created_at) as created_at
FROM table
WHERE IF(priority_date , priority_date , created_at) 
    BETWEEN '2017-10-10 00:00:00' AND '2017-10-10 23:59:59';

What is the best way to execute this query, performance-wise?

I have a fairly large table that has two datetimes. created_at and priority_date.

priority_date doesn't always exist, but if it does, it should be what is queried upon, else it falls back to created_at. created_at is always generated upon creation of the row. The above query causes a (nearly) full table scan.

The explain plan for initial query:

+------+-------------+-----------------+------+---------------+------+---------+------+--------+-------------+
| id   | select_type | table           | type | possible_keys | key  | key_len | ref  | rows   | Extra       |
+------+-------------+-----------------+------+---------------+------+---------+------+--------+-------------+
|    1 | SIMPLE      | table           | ALL  | NULL          | NULL | NULL    | NULL | 444877 | Using where |
+------+-------------+-----------------+------+---------------+------+---------+------+--------+-------------+

I should also note that priority_date or created_at may not necessarily both be within the time frame in question on a single row. So doing something like:

WHERE priority_date BETWEEN '2017-10-10 00:00:00' AND '2017-10-10 23:59:59'
OR created_at BETWEEN '2017-10-10 00:00:00' AND '2017-10-10 23:59:59'

Could give bad results if priority_date was 2017-10-04 23:10:43 and created_at was 2017-10-10 01:23:45

My current rows for said table: 582739

Count of WHERE priority_date BETWEEN... : 3908

Count of WHERE created_at BETWEEN...: 3437

Example explain of just one of the columns queried in WHERE BETWEEN:

+------+-------------+-----------------+-------+----------------------------------+----------------------------------+---------+------+------+-----------------------+
| id   | select_type | table           | type  | possible_keys                    | key                              | key_len | ref  | rows | Extra                 |
+------+-------------+-----------------+-------+----------------------------------+----------------------------------+---------+------+------+-----------------------+
|    1 | SIMPLE      | table           | range | table_created_at_index           | table_created_at_index           | 5       | NULL | 3436 | Using index condition |
+------+-------------+-----------------+-------+----------------------------------+----------------------------------+---------+------+------+-----------------------+

Clearly the IF is not the most efficient. The columns are indexed and the explains of individual rows are matching to their counts for rows on the explain plan. How can I leverage having a priority/fallback query without the wild performance loss?

EDIT

The best I've been able to figure (But WOW, is that verbose and copy/paste-y feeling)

SELECT IF(priority_date, priority_date, created_at) as created_at, priority_date
FROM table 
WHERE priority_date BETWEEN '2017-10-10 00:00:00' AND '2017-10-10 23:59:59'
    OR created_at BETWEEN '2017-10-10 00:00:00' AND '2017-10-10 23:59:59'
HAVING ((priority_date AND priority_date BETWEEN '2017-10-10 00:00:00' AND '2017-10-10 23:59:59')
    OR created_at BETWEEN '2017-10-10 00:00:00' AND '2017-10-10 23:59:59');

And its explain plan:

+------+-------------+-----------------+-------------+-----------------------------------------------------------------------+-----------------------------------------------------------------------+---------+------+------+------------------------------------------------------------------------------------------------------+
| id   | select_type | table           | type        | possible_keys                                                         | key                                                                   | key_len | ref  | rows | Extra                                                                                                |
+------+-------------+-----------------+-------------+-----------------------------------------------------------------------+-----------------------------------------------------------------------+---------+------+------+------------------------------------------------------------------------------------------------------+
|    1 | SIMPLE      | table           | index_merge | table_priority_date_index,table_created_at_index                      | table_priority_date_index,table_created_at_index                      | 6,5     | NULL | 7343 | Using sort_union(table_priority_date_index,table_created_at_index); Using where                      |
+------+-------------+-----------------+-------------+-----------------------------------------------------------------------+-----------------------------------------------------------------------+---------+------+------+------------------------------------------------------------------------------------------------------+
like image 305
Blake Avatar asked Jun 28 '26 01:06

Blake


1 Answers

First you need a compound index on (priority_date, created_at), then you can use a query like this:

SELECT IF(priority_date, priority_date, created_at) as created_at, priority_date
FROM table 
WHERE priority_date BETWEEN '2017-10-10' AND '2017-10-10 23:59:59'
  OR (priority_date IS NULL AND created_at BETWEEN '2017-10-10' AND '2017-10-10 23:59:59');

Having priority_date first in the compound index makes a big difference. No union is required.

Explain results on 400k rows with 2000 results:

Extra: Using where; Using index
key: priority_created_compound
rows: 2000
like image 53
Nealvs Avatar answered Jun 30 '26 13:06

Nealvs