Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grouping MySQL datetime into intervals irrespective of timezone

This question has been asked before but I am facing a slightly different problem.

I have a table which logs events and stores their timestamps (as datetime). I need to be able to break up time into chunks and get number of events that occurred in that interval. The interval can be custom (Say from 5 minutes to 1 hour and even beyond).

The obvious solution is to convert the datetime to unix_timestamp divide it by number of seconds in the interval, take its floor function and multiply it back by the number of seconds. Finally convert the unix_timestamp back to the datetime format.

This works fine for small intervals.

select 
from_unixtime(floor(unix_timestamp(event.timestamp)/300)*300) as start_time,
count(*) as total 
from event 
where timestamp>='2012-08-03 00:00:00' 
group by start_time;

This gives the correct output

+---------------------+-------+
| start_time          | total |
+---------------------+-------+
| 2012-08-03 00:00:00 |    11 |
| 2012-08-03 00:05:00 |     4 |
| 2012-08-03 00:10:00 |     4 |
| 2012-08-03 00:15:00 |     7 |
| 2012-08-03 00:20:00 |     8 |
| 2012-08-03 00:25:00 |     1 |
| 2012-08-03 00:30:00 |     1 |
| 2012-08-03 00:35:00 |     3 |
| 2012-08-03 00:40:00 |     3 |
| 2012-08-03 00:45:00 |     5 |
~~~~~OUTPUT SNIPPED~~~~~~~~~~~~

But if I increase the interval to say 1 hour (3600 sec)

mysql> select from_unixtime(floor(unix_timestamp(event.timestamp)/3600)*3600) as start_time, count(*) as total from event where timestamp>='2012-08-03 00:00:00' group by start_time;
+---------------------+-------+
| start_time          | total |
+---------------------+-------+
| 2012-08-02 23:30:00 |    35 |
| 2012-08-03 00:30:00 |    30 |
| 2012-08-03 01:30:00 |    12 |
| 2012-08-03 02:30:00 |    18 |
| 2012-08-03 03:30:00 |    12 |
| 2012-08-03 04:30:00 |     4 |
| 2012-08-03 05:30:00 |     3 |
| 2012-08-03 06:30:00 |    13 |
| 2012-08-03 07:30:00 |   269 |
| 2012-08-03 08:30:00 |   681 |
| 2012-08-03 09:30:00 |  1523 |
| 2012-08-03 10:30:00 |   911 |
+---------------------+-------+

The reason, as far as I could gauge, for the boundaries not being set properly is that unix_timestamp will convert time from my local timezone (GMT + 0530) to UTC and then output the numerical value.

So a value like 2012-08-03 00:00:00 will actually be 2012-08-02 18:30:00. Dividing and using floor will set the minutes part to 00. But when I use from_unixtime, it will convert it back to GMT + 0530 and hence give me intervals that begin at 30 mins.

How do I ensure the query works correctly irrespective of the timezone? I use MySQL 5.1.52 so to_seconds() is not available

EDIT: The query should also fire correctly irrespective of the interval (can be hours, minutes, days). A generic solution would be appreciated

like image 233
RedBaron Avatar asked Aug 03 '12 05:08

RedBaron


People also ask

Does MySQL datetime include timezone?

MySQL converts TIMESTAMP values from the current time zone to UTC for storage, and back from UTC to the current time zone for retrieval. (This does not occur for other types such as DATETIME .) By default, the current time zone for each connection is the server's time.

What is Date_add in MySQL?

The DATE_ADD() function adds a time/date interval to a date and then returns the date.

Is timestamp stored with timezone?

For timestamp with time zone, the internally stored value is always in UTC (Universal Coordinated Time, traditionally known as Greenwich Mean Time, GMT). An input value that has an explicit time zone specified is converted to UTC using the appropriate offset for that time zone.

What is Unix_timestamp in MySQL?

UNIX_TIMESTAMP() : This function in MySQL helps to return a Unix timestamp. We can define a Unix timestamp as the number of seconds that have passed since '1970-01-01 00:00:00'UTC. Even if you pass the current date/time or another specified date/time, the function will return a Unix timestamp based on that.


1 Answers

You can use TIMESTAMPDIFF to group by intervals of time:

For a specified interval of hours, you can use:

SELECT   '2012-08-03 00:00:00' + 
         INTERVAL FLOOR(TIMESTAMPDIFF(HOUR, '2012-08-03 00:00:00', timestamp) / <n>) * <n> HOUR AS start_time,
         COUNT(*) AS total 
FROM     event 
WHERE    timestamp >= '2012-08-03 00:00:00'
GROUP BY start_time

Replace the occurances of 2012-08-03 00:00:00 with your minimum input date.

<n> is your specified interval in hours (every 2 hours, 3 hours, etc.), and you can do the same for minutes:

SELECT   '2012-08-03 00:00:00' + 
         INTERVAL FLOOR(TIMESTAMPDIFF(MINUTE, '2012-08-03 00:00:00', timestamp) / <n>) * <n> MINUTE AS start_time,
         COUNT(*) AS total 
FROM     event 
WHERE    timestamp >= '2012-08-03 00:00:00'
GROUP BY start_time

Where <n> is your specified interval in minutes (every 45 minutes, 90 minutes, etc).

Be sure you're passing in your minimum input date (in this example 2012-08-03 00:00:00) as the second parameter to TIMESTAMPDIFF.


EDIT: If you don't want to worry about which interval unit to pick in the TIMESTAMPDIFF function, then of course just do the interval by seconds (300 = 5 minutes, 3600 = 1 hour, 7200 = 2 hours, etc.)

SELECT   '2012-08-03 00:00:00' + 
         INTERVAL FLOOR(TIMESTAMPDIFF(SECOND, '2012-08-03 00:00:00', timestamp) / <n>) * <n> SECOND AS start_time,
         COUNT(*) AS total 
FROM     event 
WHERE    timestamp >= '2012-08-03 00:00:00'
GROUP BY start_time

EDIT2: To address your comment pertaining to reducing the number of areas in the statement where you have to pass in your minimum parameter date, you can use:

SELECT   b.mindate + 
         INTERVAL FLOOR(TIMESTAMPDIFF(SECOND, b.mindate, timestamp) / <n>) * <n> SECOND AS start_time,
         COUNT(*) AS total 
FROM     event 
JOIN     (SELECT '2012-08-03 00:00:00' AS mindate) b ON timestamp >= b.mindate
GROUP BY start_time

And simply pass in your minimum datetime parameter once into the join subselect.

You can even make a second column in the join subselect for your seconds interval (e.g. 3600) and name the column something like secinterval... then change the <n>'s to b.secinterval, so you only have to pass in your minimum date parameter AND interval one time each.


SQLFiddle Demo

like image 71
Zane Bien Avatar answered Nov 15 '22 18:11

Zane Bien