Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SELECT / GROUP BY - segments of time (10 seconds, 30 seconds, etc)

I have a table (MySQL) that captures samples every n seconds. The table has many columns, but all that matters for this is two: a time stamp (of type TIMESTAMP) and a count (of type INT).

What I would like to do, is get sums and averages of the count column over a range of times. For instance, I have samples every 2 seconds recorded, but I would like the sum of the count column for all the samples in a 10 second or 30 second window for all samples.

Here's an example of the data:

 +---------------------+-----------------+ | time_stamp          | count           | +---------------------+-----------------+ | 2010-06-15 23:35:28 |               1 | | 2010-06-15 23:35:30 |               1 | | 2010-06-15 23:35:30 |               1 | | 2010-06-15 23:35:30 |             942 | | 2010-06-15 23:35:30 |             180 | | 2010-06-15 23:35:30 |               4 | | 2010-06-15 23:35:30 |              52 | | 2010-06-15 23:35:30 |              12 | | 2010-06-15 23:35:30 |               1 | | 2010-06-15 23:35:30 |               1 | | 2010-06-15 23:35:33 |            1468 | | 2010-06-15 23:35:33 |             247 | | 2010-06-15 23:35:33 |               1 | | 2010-06-15 23:35:33 |              81 | | 2010-06-15 23:35:33 |              16 | | 2010-06-15 23:35:35 |            1828 | | 2010-06-15 23:35:35 |             214 | | 2010-06-15 23:35:35 |              75 | | 2010-06-15 23:35:35 |               8 | | 2010-06-15 23:35:37 |            1799 | | 2010-06-15 23:35:37 |              24 | | 2010-06-15 23:35:37 |              11 | | 2010-06-15 23:35:37 |               2 | | 2010-06-15 23:35:40 |             575 | | 2010-06-15 23:35:40 |               1 | | 2010-06-17 10:39:35 |               2 | | 2010-06-17 10:39:35 |               2 | | 2010-06-17 10:39:35 |               1 | | 2010-06-17 10:39:35 |               2 | | 2010-06-17 10:39:35 |               1 | | 2010-06-17 10:39:40 |              35 | | 2010-06-17 10:39:40 |              19 | | 2010-06-17 10:39:40 |              37 | | 2010-06-17 10:39:42 |              64 | | 2010-06-17 10:39:42 |               3 | | 2010-06-17 10:39:42 |              31 | | 2010-06-17 10:39:42 |               7 | | 2010-06-17 10:39:42 |             246 | +---------------------+-----------------+ 

The output I would like (based on the data above) should look like this:

 +---------------------+-----------------+ | 2010-06-15 23:35:00 |               1 |  # This is the sum for the 00 - 30 seconds range | 2010-06-15 23:35:30 |            7544 |  # This is the sum for the 30 - 60 seconds range | 2010-06-17 10:39:35 |             450 |  # This is the sum for the 30 - 60 seconds range +---------------------+-----------------+ 

I have used GROUP BY to gather these numbers by the second, or by the minute, but I can't seem to figure out the syntax to get the sub-minute or range of seconds GROUP BY commands to work correctly.

I am mostly going to be using this query to syphon data from this table to another table.

Thanks!

like image 961
Eric Anderson Avatar asked Jun 21 '10 16:06

Eric Anderson


2 Answers

GROUP BY UNIX_TIMESTAMP(time_stamp) DIV 30

or say for some reason you wanted to group them in 20-second intervals it would be DIV 20 etc. To change the boundaries between GROUP BY values you could use

GROUP BY (UNIX_TIMESTAMP(time_stamp) + r) DIV 30

where r is a literal nonnegative integer less than 30. So

GROUP BY (UNIX_TIMESTAMP(time_stamp) + 5) DIV 30

should give you sums between hh:mm:05 and hh:mm:35 and between hh:mm:35 and hh:mm+1:05.

like image 141
Hammerite Avatar answered Sep 23 '22 12:09

Hammerite


I tried Hammerite's solution in my project, but it didn't work well where there were missing samples from the series. Here's an example of the query that is supposed to select timestamp (ts), user name and average measure from metric_table and group the results by 27-minute time intervals:

select      min(ts),      user_name,      sum(measure) / 27 from metric_table  where      ts between date_sub('2015-03-17 00:00:00', INTERVAL 2160 MINUTE) and '2015-03-17 00:00:00'   group by unix_timestamp(ts) div 1620, user_name  order by ts, user_name ; 

Note: 27 minutes (in select) = 1620 seconds (in group by), 2160 minutes = 3 days (that's the time range)

When I ran this query against a time series where samples were irregularly recorded (in other words: for any given time stamp there was no guarantee to find measure values for all user names) the results were not stamped according to the interval (were not placed every 27 minutes). I suspect that was due to min(ts) returning a time stamp in some groups that was greater than the expected floor(ts0 + i*interval). I modified the former query to this one:

select      from_unixtime(unix_timestamp(ts) - unix_timestamp(ts) mod 1620) as ts1,      user_name,      sum(measure) / 27 from metric_table where      ts between date_sub('2015-03-17 00:00:00', INTERVAL 2160 MINUTE) and '2015-03-17 00:00:00'   group by ts1, user_name  order by ts1, user_name ; 

and it works fine even when the samples are missing. I think that is because once the time math is moved to select it guarantees that ts1 will align with the time steps.

like image 27
mac13k Avatar answered Sep 22 '22 12:09

mac13k