Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgresql SQL GROUP BY time interval with arbitrary accuracy (down to milli seconds)

I have my measurement data stored into the following structure:

CREATE TABLE measurements( measured_at TIMESTAMPTZ, val INTEGER ); 

I already know that using

(a) date_trunc('hour',measured_at)

AND

(b) generate_series

I would be able to aggregate my data by:

microseconds, milliseconds . . . 

But is it possible to aggregate the data by 5 minutes or let's say an arbitrary amount of seconds? Is it possible to aggregate measured data by an arbitrary multiple of seconds?

I need the data aggregated by different time resolutions to feed them into a FFT or an AR-Model in order to see possible seasonalities.

like image 568
user1612798 Avatar asked Aug 20 '12 21:08

user1612798


People also ask

How interval works in PostgreSQL?

In PostgreSQL, the Interval is another type of data type used to store and deploy Time in years, months, days, hours, minutes, seconds, etc. And the months and days values are integers values, whereas the second's field can be the fractions values.

How do I create an interval in PostgreSQL?

In PostgreSQL, the make_interval() function creates an interval from years, months, weeks, days, hours, minutes and seconds fields. You provide the years, months, weeks, days, hours, minutes and/or seconds fields, and it will return an interval in the interval data type.

Is interval data type in PostgreSQL?

In PostgreSQL the interval data type is used to store and manipulate a time period. It holds 16 bytes of space and ranging from -178, 000, 000 years to 178, 000, 000 years.


2 Answers

You can generate a table of "buckets" by adding intervals created by generate_series(). This SQL statement will generate a table of five-minute buckets for the first day (the value of min(measured_at)) in your data.

select    (select min(measured_at)::date from measurements) + ( n    || ' minutes')::interval start_time,   (select min(measured_at)::date from measurements) + ((n+5) || ' minutes')::interval end_time from generate_series(0, (24*60), 5) n 

Wrap that statement in a common table expression, and you can join and group on it as if it were a base table.

with five_min_intervals as (   select      (select min(measured_at)::date from measurements) + ( n    || ' minutes')::interval start_time,     (select min(measured_at)::date from measurements) + ((n+5) || ' minutes')::interval end_time   from generate_series(0, (24*60), 5) n ) select f.start_time, f.end_time, avg(m.val) avg_val  from measurements m right join five_min_intervals f          on m.measured_at >= f.start_time and m.measured_at < f.end_time group by f.start_time, f.end_time order by f.start_time 

Grouping by an arbitrary number of seconds is similar--use date_trunc().


A more general use of generate_series() lets you avoid guessing the upper limit for five-minute buckets. In practice, you'd probably build this as a view or a function. You might get better performance from a base table.

select    (select min(measured_at)::date from measurements) + ( n    || ' minutes')::interval start_time,   (select min(measured_at)::date from measurements) + ((n+5) || ' minutes')::interval end_time from generate_series(0, ((select max(measured_at)::date - min(measured_at)::date from measurements) + 1)*24*60, 5) n; 
like image 77
Mike Sherrill 'Cat Recall' Avatar answered Sep 19 '22 07:09

Mike Sherrill 'Cat Recall'


Catcall has a great answer. My example of using it demonstrates having fixed buckets - in this case 30 minute intervals starting at midnight. It also shows that there can be one extra bucket generated in Catcall's first version and how to eliminate it. I wanted exactly 48 buckets in a day. In my problem, observations have separate date and time columns and I want to average the observations within a 30 minute period across the month for a number of different services.

with intervals as (     select         (n||' minutes')::interval as start_time,          ((n+30)|| ' minutes')::interval as end_time     from generate_series(0, (23*60+30), 30) n ) select i.start_time, o.service, avg(o.o) from observations o right join intervals i on o.time >= i.start_time and o.time < i.end_time where o.date between '2013-01-01' and '2013-01-31' group by i.start_time, i.end_time, o.service order by i.start_time 
like image 32
Julian Avatar answered Sep 21 '22 07:09

Julian