My table 'my_logs' have about 20,000,000 records, and I want to find out how many logs I have in each date within a few days.
I want to have a result like
+------------+---------+
| date | count |
+------------+---------+
| 2016-07-01 | 1623 |
| 2016-07-02 | 1280 |
| 2016-07-03 | 2032 |
+------------+---------+
This query below only take me milliseconds to finish, that's good
SELECT DATE_FORMAT(created_at, '%Y-%m-%d') as date,
COUNT(*) as count
FROM my_logs
WHERE created_at BETWEEN '2016-07-01' AND '2016-07-04'
GROUP BY DATE_FORMAT(created_at, '%Y-%m-%d')
The Explain of query:
+------------+---------+-------+-----------------------------+
|select_type | table | type | possible_keys |
+------------+---------+-------+-----------------------------+
| SIMPLE | my_logs| index | index_my_logs_on_created_at |
+------------+---------+-------+-----------------------------+
+-----------------------------+---------+----------+
| key | key_len | rows |
+-----------------------------+---------+----------+
| index_my_logs_on_created_at | 10 | 23458462 |
+-----------------------------+---------+----------+
+-----------------------------------------------------------+
| Extra |
+-----------------------------------------------------------+
| Using where; Using index; Using temporary; Using filesort |
+-----------------------------------------------------------+
However, I need to convert the timezone of each record to fit the time in my country, and I need to group by the 'Date' information, so I need to convert the column itself.
Both
SELECT COUNT(*)
FROM my_logs
WHERE DATE_ADD(created_at, INTERVAL 8 HOUR) BETWEEN '2016-07-01' AND '2016-07-04'
GROUP BY DATE_FORMAT(DATE_ADD(created_at, INTERVAL 8 HOUR), '%Y-%m-%d')
and
SELECT COUNT(*)
FROM my_logs
WHERE CONVERT_TZ(created_at, "+00:00", "+08:00") BETWEEN '2016-07-01' AND '2016-07-04'
GROUP BY DATE_FORMAT(CONVERT_TZ(created_at, "+00:00", "+08:00"),
'%Y-%m-%d')
take me about 12s to finish the query, it is unbearable slow!!
(The Explain is the same as the query in the top)
I think it is common problem but I can't find a good way to deal with it, does anyone has a more efficient way to do it? Thanks!
In MySQL the CONVERT_TZ() returns a resulting value after converting a datetime value from a time zone specified as the second argument to the time zone specified as the third argument. This function returns NULL when the arguments are invalid. A datetime. A time zone which will be converted to to_tz.
The following is the syntax to get the current time zone of MySQL. mysql> SELECT @@global. time_zone, @@session.
MySQL converts TIMESTAMP values from the current time zone to UTC for storage, and back from UTC to the current time zone for retrieval. (This does not occur for other types such as DATETIME .) By default, the current time zone for each connection is the server's time.
The other way is to use the MySQL SEC_TO_TIME() function. The MySQL SEC_TO_TIME() function is used to return a time value in format HH:MM:SS from the specified number of seconds. In other words, it converts a value in seconds only to the format HH:MM:SS.
Which datatype, TIMESTAMP
vs. DATETIME
, did you use? (But, I'll ignore that.)
Do not "hide" an indexed column (created_at
) inside any function (CONVERT_TZ()
). It makes it so that the WHERE
clause cannot use the index and must scan the table instead. This fix is simple:
WHERE created_at >= '2016-07-01' - INTERVAL 8 HOUR
AND created_at < '2016-07-04' - INTERVAL 8 HOUR
(or use CONVERT_TZ
). Note that I also fixed the bug wherein you included midnight from the 4th. Note: Even + INTERVAL...
is effectively a function.
Expressions in the SELECT
and the GROUP BY
are far less critical to performance.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With