Is it still correct than an index on a date/datetime column is not optimized for YEAR(col), MONTH(col) functions? Bill Karwin gave a pretty definitive answer here, but since this was from ten years ago I wanted to check. I would have thought that since the date column is described as
A three-byte integer packed as YYYY×16×32 + MM×32 + DD
that it could be indexed in such a way that it could be optimized in the same way that multi column indexes are when a tight index scan is performed.
Is it true this optimization still does not exist? And why wouldn't it be possible?
Correct. MySQL optimizer won't use an index once a column in the WHERE clause is wrapped with a function. The simple reasoning is that once you have an index on column my_date
, the database sees no point to search in it when you're actually looking for a different value, for example the output of YEAR(my_date).
The good news though are that there is a quick and simple solution in some cases.
For example, the following query can be optimized using a different approach:
SELECT flight_company, count(*)
FROM flight_times
WHERE year(FlightDate) = 2017
group by flight_company
Instead of using the YEAR() function in the WHERE clause, you can just replace the function call with a range condition that will check the same thing:
SELECT flight_company, count(*)
FROM flight_times
WHERE FlightDate between '2017-01-01'
and '2017-12-31'
GROUP BY flight_company
There are functions though that can't simply replaced by a range condition such as in YEAR's case. For example, how would you replace dayofweek() with a condition? Probably harder. Therefore, another approach is to use MySQL 5.7 Virtual (generated) columns. If you're taking this approach, you can just create this virtual column in your CREATE TABLE statement, to actually match the result of dayofweek():
Flight_dayofweek tinyint(4)
GENERATED ALWAYS AS (dayofweek(FlightDate
VIRTUAL
As the co-founder of EverSQL, I will humbly suggest you to use EverSQL SQL Query Optimizer to automatically optimize such queries.
The problem is not the representation of the date. The problem is the optimization of the query. When you use YEAR()
and MONTH()
on a date column, the column is a function of an argument.
That means that the compiler would have a lot of information about the function in order to optimize it using an index scan or index lookup. Such as:
Of course, these things are possible. The challenge is actually building them into a function definition and into the optimizer. No one who builds SQL optimizers would want to add special cases for a specific function (well, almost no one; SQL Server does allow some use of cast()
to take advantage of an index).
So, the answer to your question is that Bill's answer is still valid.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With