Performing aggregation through date and time in SQL

Tags:

I have a data-set which contains observations for several weeks with 2 minutes frequency. I want to increase the time interval from 2 minute to 5 minute. The problem is that, frequency of the observations are not always the same. I mean, theoretically, every 10 minute there should be 5 observation but usually it is not the case. Please let me know how I can aggregate the observations based on average function and with respect to the time and date of the observations. In other words aggregation based on every 5 minutes while number of observations are not the same for each 5 minute time interval. Moreover, I have date and time in timestamp format.

Example Data:

1 2007-09-14 22:56:12 5.39
2 2007-09-14 22:58:12 5.34
3 2007-09-14 23:00:12 5.16
4 2007-09-14 23:02:12 5.54
5 2007-09-14 23:04:12 5.30
6 2007-09-14 23:06:12 5.20

expected results:

1 2007-09-14 23:00 5.29
2 2007-09-14 23:05 5.34

738

asked Oct 22 '12 09:10

A.Amidi

3 Answers

The answers to this question likely provide good solutions to your problem, showing ways to efficiently aggregate data into time windows.

Essentially, use the avg aggregate with:

GROUP BY floor(extract(epoch from the_timestamp) / 60 / 5)

105

answered Oct 18 '22 19:10

Craig Ringer

Ok, so this is just one way to handle this. I hope this gets you thinking about how to convert the data for you analysis needs.

There's a prerequisite to test out this code. You need to have a table with all possible 1-minute timestamps. There are many ways to go about this, I'll just use what I have available, which is one table: dim_time which has each minute (00:01:00) through (23:59:00) and another table with all possible dates (dim_date). When you join these (1=1) you get all possible minutes for all possible days.

--first you need to create some functions I'll use later
--credit to this first function goes to David Walling
CREATE OR REPLACE FUNCTION dev.beginning_datetime_floor(timestamp without time zone, integer)
  RETURNS timestamp without time zone AS
$BODY$ 
SELECT
date_trunc('minute',timestamp with time zone 'epoch' + 
    floor(extract(epoch from $1)/($2*60))*$2*60
* interval '1 second') at time zone 'CST6CDT'
$BODY$
  LANGUAGE sql VOLATILE;

--the following function is what I described on my previous post  
CREATE OR REPLACE FUNCTION dev.round_minutes(timestamp without time zone, integer)
  RETURNS timestamp without time zone AS
$BODY$ 
  SELECT date_trunc('hour', $1) + cast(($2::varchar||' min') as interval) * round(date_part('minute',$1)::float / cast($2 as float)) 
$BODY$
  LANGUAGE sql VOLATILE;

--let's load the data into a temp table, I added some data points. note: i got rid of the partial seconds
SELECT cast(timestamp_original as timestamp) as timestamp_original, datapoint INTO TEMPORARY TABLE timestamps_second2
FROM
(
SELECT '2007-09-14 22:56:12' as timestamp_original, 0 as datapoint
UNION
SELECT '2007-09-14 22:58:12' as timestamp_original, 1 as datapoint
UNION
SELECT '2007-09-14 23:00:12' as timestamp_original, 10 as datapoint 
UNION
SELECT '2007-09-14 23:02:12' as timestamp_original, 100 as datapoint
UNION
SELECT '2007-09-14 23:04:12' as timestamp_original, 1000 as datapoint
UNION
SELECT '2007-09-14 23:06:12' as timestamp_original, 10000 as datapoint
) as data

--this is the bit of code you'll have to replace with your implementation of getting all possible minutes
--you could make some sequence of timestamps in R, or simply make the timestamps in Excel to test out the rest of the code
--the result of the query is simply '2007-09-14 00:00:00' through '2007-09-14 23:59:00'
SELECT * INTO TEMPORARY TABLE possible_timestamps
FROM
(
select the_date + beginning_minute as minute_timestamp
FROM datawarehouse.dim_date as dim_date
JOIN datawarehouse.dim_time as dim_time
ON 1=1
where dim_date.the_date = '2007-09-14'
group by the_date, beginning_minute
order by the_date, beginning_minute
) as data

--round to nearest minute (be sure to think about how this might change your results
SELECT * INTO TEMPORARY TABLE rounded_timestamps2
FROM
(
SELECT dev.round_minutes(timestamp_original,1) as minute_timestamp_rounded, datapoint
from timestamps_second2
) as data

--let's join what minutes we have data for versus the possible minutes
--I used some subqueries so when you select all from the table you'll see the important part (not needed)
SELECT * INTO TEMPORARY TABLE joined_with_possibles
FROM
(
SELECT *
FROM
(
SELECT *, (MIN(minute_timestamp_rounded) OVER ()) as min_time, (MAX(minute_timestamp_rounded) OVER ()) as max_time
FROM possible_timestamps as t1
LEFT JOIN rounded_timestamps2 as t2
ON t1.minute_timestamp = t2.minute_timestamp_rounded
ORDER BY t1.minute_timestamp asc
) as inner_query
WHERE minute_timestamp >= min_time
AND minute_timestamp <= max_time
) as data

--here's the tricky part that might not suit your needs, but it's one method
--if it's missing a value it grabs the previous value
--if it's missing the prior value it grabs the one before that, otherwise it's null
--best practice would be run another case statement with 0,1,2 specifying which point was pulled, then you can count those when you aggregate
SELECT * INTO TEMPORARY TABLE shifted_values
FROM
(
SELECT 
*,
case 
when datapoint is not null then datapoint
when datapoint is null and (lag(datapoint,1) over (order by minute_timestamp asc)) is not null
  then lag(datapoint,1) over (order by minute_timestamp asc)
when datapoint is null and (lag(datapoint,1) over (order by minute_timestamp asc)) is null and (lag(datapoint,2) over (order by minute_timestamp asc)) is not null
  then lag(datapoint,2) over (order by minute_timestamp asc)
else null end as last_good_value
from joined_with_possibles
ORDER BY minute_timestamp asc
) as data

--now we use the function from my previous post to make the timestamps to aggregate on
SELECT * INTO TEMPORARY TABLE shifted_values_with_five_minute
FROM
(
SELECT *, dev.beginning_datetime_floor(minute_timestamp,5) as five_minute_timestamp
FROM shifted_values
) as data

--finally we aggregate
SELECT
AVG(datapoint) as avg_datapoint, five_minute_timestamp
FROM shifted_values_with_five_minute
GROUP BY five_minute_timestamp

answered Oct 18 '22 21:10

ideamotor

By far the simplest option is to create a reference table. In that table you store the intervals over which you are insterested:

(Adapt this to your own RDBMS's date notation.)

CREATE TABLE interval (
  start_time    DATETIME,
  cease_time    DATETIME
);
INSERT INTO interval SELECT '2012-10-22 12:00', '2012-10-22 12:05';
INSERT INTO interval SELECT '2012-10-22 12:05', '2012-10-22 12:10';
INSERT INTO interval SELECT '2012-10-22 12:10', '2012-10-22 12:15';
INSERT INTO interval SELECT '2012-10-22 12:15', '2012-10-22 12:20';
INSERT INTO interval SELECT '2012-10-22 12:20', '2012-10-22 12:25';
INSERT INTO interval SELECT '2012-10-22 12:25', '2012-10-22 12:30';
INSERT INTO interval SELECT '2012-10-22 12:30', '2012-10-22 12:35';
INSERT INTO interval SELECT '2012-10-22 12:35', '2012-10-22 12:40';

Then you just join and aggregate...

SELECT
  interval.start_time,
  AVG(observation.value)
FROM
  interval
LEFT JOIN
  observation
    ON  observation.timestamp >= interval.start_time
    AND observation.timestamp <  interval.cease_time
GROUP BY
  interval.start_time

NOTE: You only need to create and populate that intervals table once, then you can re-use it many times.

answered Oct 18 '22 19:10

MatBailie

Related questions
                            
                                SQL LIKE, how to sort results by weighted occurrence count?
                            
                                mysql auto_increment column increments by a random value
                            
                                Same query - different execution plans
                            
                                Best database design for product with different price for each attributes
                            
                                SQL: float number to hours format
                            
                                Is a long IN clause a code smell?
                            
                                How do I create a SQL Server function to return an int?
                            
                                How to create a stored procedure within another stored procedure in SQL Server 2008
                            
                                where to define default value in oracle package
                            
                                Cannot find the asymmetric key -- because it does not exist or you do not have permission
                            
                                Postgres 9 super slow simple delete
                            
                                Does DBIx::Class do unions?
                            
                                Loops within dynamic SQL
                            
                                SELECT COUNT(*) with an ORDER BY
                            
                                MySQL LIMIT before Grouping?
                            
                                creating a temporary table from a query using sqlalchemy orm
                            
                                PostgreSQL stored procedure with RETURNS TABLE(id integer) returning all NULLs
                            
                                SQL Multiple condition on single field
                            
                                Is Mysql UUID_SHORT() comparable to UUID()
                            
                                create while loop with cte

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Performing aggregation through date and time in SQL

Tags:

sql

timestamp

postgresql

aggregation