Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

INDEX on DATE field and MONTH() YEAR() functions

Tags:

sql

mysql

Is it still correct than an index on a date/datetime column is not optimized for YEAR(col), MONTH(col) functions? Bill Karwin gave a pretty definitive answer here, but since this was from ten years ago I wanted to check. I would have thought that since the date column is described as

A three-byte integer packed as YYYY×16×32 + MM×32 + DD

that it could be indexed in such a way that it could be optimized in the same way that multi column indexes are when a tight index scan is performed.

Is it true this optimization still does not exist? And why wouldn't it be possible?

like image 251
Alden W. Avatar asked Oct 26 '16 01:10

Alden W.


Video Answer


2 Answers

Correct. MySQL optimizer won't use an index once a column in the WHERE clause is wrapped with a function. The simple reasoning is that once you have an index on column my_date, the database sees no point to search in it when you're actually looking for a different value, for example the output of YEAR(my_date).

The good news though are that there is a quick and simple solution in some cases.

For example, the following query can be optimized using a different approach:

SELECT flight_company, count(*)
FROM flight_times
WHERE year(FlightDate) = 2017
group by flight_company

Instead of using the YEAR() function in the WHERE clause, you can just replace the function call with a range condition that will check the same thing:

SELECT flight_company, count(*) 
FROM flight_times
WHERE FlightDate between '2017-01-01'
and '2017-12-31'
GROUP BY flight_company

There are functions though that can't simply replaced by a range condition such as in YEAR's case. For example, how would you replace dayofweek() with a condition? Probably harder. Therefore, another approach is to use MySQL 5.7 Virtual (generated) columns. If you're taking this approach, you can just create this virtual column in your CREATE TABLE statement, to actually match the result of dayofweek():

Flight_dayofweek tinyint(4)
 GENERATED ALWAYS AS (dayofweek(FlightDate
 VIRTUAL

As the co-founder of EverSQL, I will humbly suggest you to use EverSQL SQL Query Optimizer to automatically optimize such queries.

like image 93
Tomer Shay Avatar answered Oct 19 '22 12:10

Tomer Shay


The problem is not the representation of the date. The problem is the optimization of the query. When you use YEAR() and MONTH() on a date column, the column is a function of an argument.

That means that the compiler would have a lot of information about the function in order to optimize it using an index scan or index lookup. Such as:

  • The function would have to return the same values given the same arguments.
  • The function would take only one argument.
  • The function would need to be monotonic.
  • The function would need to have an inverse (for starting an index scan).

Of course, these things are possible. The challenge is actually building them into a function definition and into the optimizer. No one who builds SQL optimizers would want to add special cases for a specific function (well, almost no one; SQL Server does allow some use of cast() to take advantage of an index).

So, the answer to your question is that Bill's answer is still valid.

like image 7
Gordon Linoff Avatar answered Oct 19 '22 11:10

Gordon Linoff