I have my measurement data stored into the following structure:
CREATE TABLE measurements( measured_at TIMESTAMPTZ, val INTEGER );
I already know that using
(a) date_trunc('hour',measured_at)
AND
(b) generate_series
I would be able to aggregate my data by:
microseconds, milliseconds . . .
But is it possible to aggregate the data by 5 minutes or let's say an arbitrary amount of seconds? Is it possible to aggregate measured data by an arbitrary multiple of seconds?
I need the data aggregated by different time resolutions to feed them into a FFT or an AR-Model in order to see possible seasonalities.
In PostgreSQL, the Interval is another type of data type used to store and deploy Time in years, months, days, hours, minutes, seconds, etc. And the months and days values are integers values, whereas the second's field can be the fractions values.
In PostgreSQL, the make_interval() function creates an interval from years, months, weeks, days, hours, minutes and seconds fields. You provide the years, months, weeks, days, hours, minutes and/or seconds fields, and it will return an interval in the interval data type.
In PostgreSQL the interval data type is used to store and manipulate a time period. It holds 16 bytes of space and ranging from -178, 000, 000 years to 178, 000, 000 years.
You can generate a table of "buckets" by adding intervals created by generate_series(). This SQL statement will generate a table of five-minute buckets for the first day (the value of min(measured_at)
) in your data.
select (select min(measured_at)::date from measurements) + ( n || ' minutes')::interval start_time, (select min(measured_at)::date from measurements) + ((n+5) || ' minutes')::interval end_time from generate_series(0, (24*60), 5) n
Wrap that statement in a common table expression, and you can join and group on it as if it were a base table.
with five_min_intervals as ( select (select min(measured_at)::date from measurements) + ( n || ' minutes')::interval start_time, (select min(measured_at)::date from measurements) + ((n+5) || ' minutes')::interval end_time from generate_series(0, (24*60), 5) n ) select f.start_time, f.end_time, avg(m.val) avg_val from measurements m right join five_min_intervals f on m.measured_at >= f.start_time and m.measured_at < f.end_time group by f.start_time, f.end_time order by f.start_time
Grouping by an arbitrary number of seconds is similar--use date_trunc()
.
A more general use of generate_series() lets you avoid guessing the upper limit for five-minute buckets. In practice, you'd probably build this as a view or a function. You might get better performance from a base table.
select (select min(measured_at)::date from measurements) + ( n || ' minutes')::interval start_time, (select min(measured_at)::date from measurements) + ((n+5) || ' minutes')::interval end_time from generate_series(0, ((select max(measured_at)::date - min(measured_at)::date from measurements) + 1)*24*60, 5) n;
Catcall has a great answer. My example of using it demonstrates having fixed buckets - in this case 30 minute intervals starting at midnight. It also shows that there can be one extra bucket generated in Catcall's first version and how to eliminate it. I wanted exactly 48 buckets in a day. In my problem, observations have separate date and time columns and I want to average the observations within a 30 minute period across the month for a number of different services.
with intervals as ( select (n||' minutes')::interval as start_time, ((n+30)|| ' minutes')::interval as end_time from generate_series(0, (23*60+30), 30) n ) select i.start_time, o.service, avg(o.o) from observations o right join intervals i on o.time >= i.start_time and o.time < i.end_time where o.date between '2013-01-01' and '2013-01-31' group by i.start_time, i.end_time, o.service order by i.start_time
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With