I have a table (MySQL) that captures samples every n seconds. The table has many columns, but all that matters for this is two: a time stamp (of type TIMESTAMP) and a count (of type INT).
What I would like to do, is get sums and averages of the count column over a range of times. For instance, I have samples every 2 seconds recorded, but I would like the sum of the count column for all the samples in a 10 second or 30 second window for all samples.
Here's an example of the data:
+---------------------+-----------------+ | time_stamp | count | +---------------------+-----------------+ | 2010-06-15 23:35:28 | 1 | | 2010-06-15 23:35:30 | 1 | | 2010-06-15 23:35:30 | 1 | | 2010-06-15 23:35:30 | 942 | | 2010-06-15 23:35:30 | 180 | | 2010-06-15 23:35:30 | 4 | | 2010-06-15 23:35:30 | 52 | | 2010-06-15 23:35:30 | 12 | | 2010-06-15 23:35:30 | 1 | | 2010-06-15 23:35:30 | 1 | | 2010-06-15 23:35:33 | 1468 | | 2010-06-15 23:35:33 | 247 | | 2010-06-15 23:35:33 | 1 | | 2010-06-15 23:35:33 | 81 | | 2010-06-15 23:35:33 | 16 | | 2010-06-15 23:35:35 | 1828 | | 2010-06-15 23:35:35 | 214 | | 2010-06-15 23:35:35 | 75 | | 2010-06-15 23:35:35 | 8 | | 2010-06-15 23:35:37 | 1799 | | 2010-06-15 23:35:37 | 24 | | 2010-06-15 23:35:37 | 11 | | 2010-06-15 23:35:37 | 2 | | 2010-06-15 23:35:40 | 575 | | 2010-06-15 23:35:40 | 1 | | 2010-06-17 10:39:35 | 2 | | 2010-06-17 10:39:35 | 2 | | 2010-06-17 10:39:35 | 1 | | 2010-06-17 10:39:35 | 2 | | 2010-06-17 10:39:35 | 1 | | 2010-06-17 10:39:40 | 35 | | 2010-06-17 10:39:40 | 19 | | 2010-06-17 10:39:40 | 37 | | 2010-06-17 10:39:42 | 64 | | 2010-06-17 10:39:42 | 3 | | 2010-06-17 10:39:42 | 31 | | 2010-06-17 10:39:42 | 7 | | 2010-06-17 10:39:42 | 246 | +---------------------+-----------------+
The output I would like (based on the data above) should look like this:
+---------------------+-----------------+ | 2010-06-15 23:35:00 | 1 | # This is the sum for the 00 - 30 seconds range | 2010-06-15 23:35:30 | 7544 | # This is the sum for the 30 - 60 seconds range | 2010-06-17 10:39:35 | 450 | # This is the sum for the 30 - 60 seconds range +---------------------+-----------------+
I have used GROUP BY to gather these numbers by the second, or by the minute, but I can't seem to figure out the syntax to get the sub-minute or range of seconds GROUP BY commands to work correctly.
I am mostly going to be using this query to syphon data from this table to another table.
Thanks!
GROUP BY UNIX_TIMESTAMP(time_stamp) DIV 30
or say for some reason you wanted to group them in 20-second intervals it would be DIV 20
etc. To change the boundaries between GROUP BY
values you could use
GROUP BY (UNIX_TIMESTAMP(time_stamp) + r) DIV 30
where r
is a literal nonnegative integer less than 30. So
GROUP BY (UNIX_TIMESTAMP(time_stamp) + 5) DIV 30
should give you sums between hh:mm:05 and hh:mm:35 and between hh:mm:35 and hh:mm+1:05.
I tried Hammerite's solution in my project, but it didn't work well where there were missing samples from the series. Here's an example of the query that is supposed to select timestamp (ts), user name and average measure from metric_table and group the results by 27-minute time intervals:
select min(ts), user_name, sum(measure) / 27 from metric_table where ts between date_sub('2015-03-17 00:00:00', INTERVAL 2160 MINUTE) and '2015-03-17 00:00:00' group by unix_timestamp(ts) div 1620, user_name order by ts, user_name ;
Note: 27 minutes (in select) = 1620 seconds (in group by), 2160 minutes = 3 days (that's the time range)
When I ran this query against a time series where samples were irregularly recorded (in other words: for any given time stamp there was no guarantee to find measure values for all user names) the results were not stamped according to the interval (were not placed every 27 minutes). I suspect that was due to min(ts) returning a time stamp in some groups that was greater than the expected floor(ts0 + i*interval). I modified the former query to this one:
select from_unixtime(unix_timestamp(ts) - unix_timestamp(ts) mod 1620) as ts1, user_name, sum(measure) / 27 from metric_table where ts between date_sub('2015-03-17 00:00:00', INTERVAL 2160 MINUTE) and '2015-03-17 00:00:00' group by ts1, user_name order by ts1, user_name ;
and it works fine even when the samples are missing. I think that is because once the time math is moved to select it guarantees that ts1 will align with the time steps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With