Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Moving Average based on Timestamps in PostgreSQL

Tags:

I wanted to perform moving average through timestamps. I have two columns: Temperature and timestamps (time-date) and I want to perform the moving average based on every 15 minutes successive temperature observations. In other words, selecting data to perform the average based on 15 minutes time interval. Moreover, it is possible to have different number of observations for different time sequences. I meant all the window sizes are equal (15 minutes) but it is possible to have different number of observations in each window. For example: For a first window we have to calculate the average of n observation and for second window calculate the average of the observation for n+5 observation.

Data Sample:

ID   Timestamps          Temperature
1    2007-09-14 22:56:12 5.39
2    2007-09-14 22:58:12 5.34
3    2007-09-14 23:00:12 5.16
4    2007-09-14 23:02:12 5.54
5    2007-09-14 23:04:12 5.30
6    2007-09-14 23:06:12 5.20
7    2007-09-14 23:10:12 5.39
8    2007-09-14 23:12:12 5.34
9    2007-09-14 23:20:12 5.16
10   2007-09-14 23:24:12 5.54
11   2007-09-14 23:30:12 5.30
12   2007-09-14 23:33:12 5.20
13   2007-09-14 23:40:12 5.39
14   2007-09-14 23:42:12 5.34
15   2007-09-14 23:44:12 5.16
16   2007-09-14 23:50:12 5.54
17   2007-09-14 23:52:12 5.30
18   2007-09-14 23:57:12 5.20

Main Challenges:

How I can learn the code to discriminate every 15 minute while there are not exact 15 minutes time intervals due to different sampling frequency.

like image 432
A.Amidi Avatar asked Dec 11 '12 10:12

A.Amidi


3 Answers

You can join your table with itself:

select l1.id, avg( l2.Temperature )
from l l1
inner join l l2 
   on l2.id <= l1.id and
      l2.Timestamps + interval '15 minutes' > l1.Timestamps
group by l1.id
order by id
;

Results:

| ID |            AVG |
-----------------------
|  1 |           5.39 |
|  2 |          5.365 |
|  3 | 5.296666666667 |
|  4 |         5.3575 |
|  5 |          5.346 |
|  6 | 5.321666666667 |
|  7 | 5.331428571429 |

Notice: Only 'hard work' is made. You should join result with original table or append new columns to query. I don't know your final query needed. Adapt this solution or ask for more help.

like image 129
dani herrera Avatar answered Sep 20 '22 11:09

dani herrera


Assuming you want to restart the rolling average after each 15 minute interval:

select id, 
       temp,
       avg(temp) over (partition by group_nr order by time_read) as rolling_avg
from (       
  select id, 
         temp,
         time_read, 
         interval_group,
         id - row_number() over (partition by interval_group order by time_read) as group_nr
  from (
    select id, 
           time_read, 
           'epoch'::timestamp + '900 seconds'::interval * (extract(epoch from time_read)::int4 / 900) as interval_group,
           temp
    from readings
  ) t1
) t2
order by time_read;

It is based on Depesz's solution to group by "time ranges":

Here is an SQLFiddle example: http://sqlfiddle.com/#!1/0f3f0/2

like image 34
a_horse_with_no_name Avatar answered Sep 19 '22 11:09

a_horse_with_no_name


Here's an approach that utilises the facility to use an aggregation function as a window function. The aggregate function keeps the last 15 minutes' worth of observations in an array, along with the current running total. The state transition function shifts elements off the array that have fallen behind the 15-minute window, and pushes on the latest observation. The final function simply computes the mean temperature in the array.

Now, as to whether this is a benefit or not... it depends. It focuses on the plgpsql-execution part of postgresql rather than database-access part, and my own experiences is that plpgsql is not fast. If you can easily do lookups back to the table to find the previous 15 minutes' rows for each observation, a self-join (as in @danihp answer) will do well. However, this approach can deal with the observations coming from some more complex source, where those lookups aren't practical. As ever, trial and compare on your own system.

-- based on using this table definition
create table observation(id int primary key, timestamps timestamp not null unique,
                         temperature numeric(5,2) not null);

-- note that I'm reusing the table structure as a type for the state here
create type rollavg_state as (memory observation[], total numeric(5,2));

create function rollavg_func(state rollavg_state, next_in observation) returns rollavg_state immutable language plpgsql as $$
declare
  cutoff timestamp;
  i int;
  updated_memory observation[];
begin
  raise debug 'rollavg_func: state=%, next_in=%', state, next_in;
  cutoff := next_in.timestamps - '15 minutes'::interval;
  i := array_lower(state.memory, 1);
  raise debug 'cutoff is %', cutoff;
  while i <= array_upper(state.memory, 1) and state.memory[i].timestamps < cutoff loop
    raise debug 'shifting %', state.memory[i].timestamps;
    i := i + 1;
    state.total := state.total - state.memory[i].temperature;
  end loop;
  state.memory := array_append(state.memory[i:array_upper(state.memory, 1)], next_in);
  state.total := coalesce(state.total, 0) + next_in.temperature;
  return state;
end
$$;

create function rollavg_output(state rollavg_state) returns float8 immutable language plpgsql as $$
begin
  raise debug 'rollavg_output: state=% len=%', state, array_length(state.memory, 1);
  if array_length(state.memory, 1) > 0 then
    return state.total / array_length(state.memory, 1);
  else
    return null;
  end if;
end
$$;

create aggregate rollavg(observation) (sfunc = rollavg_func, finalfunc = rollavg_output, stype = rollavg_state);

-- referring to just a table name means a tuple value of the row as a whole, whose type is the table type
-- the aggregate relies on inputs arriving in ascending timestamp order
select rollavg(observation) over (order by timestamps) from observation;
like image 40
araqnid Avatar answered Sep 19 '22 11:09

araqnid