Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Weighted moving average in Amazon Redshift

Is there a way to calculate a weighted moving average with a fixed window size in Amazon Redshift? In more detail, given a table with a date column and a value column, for each date compute the weighted average value over a window of a specified size, with weights specified in an auxiliary table.

My search attempts so far yielded plenty of examples for doing this with window functions for simple average (without weights), for example here. There are also some related suggestions for postgres, e.g., this SO question, however Redshift's feature set is quite sparse compared with postgres and it doesn't support many of the advanced features that are suggested.

like image 783
ytsaig Avatar asked Nov 10 '22 19:11

ytsaig


1 Answers

Assuming we have the following tables:

create temporary table _data (ref_date date, value int);
insert into _data values
    ('2016-01-01', 34)
  , ('2016-01-02', 12)
  , ('2016-01-03', 25)
  , ('2016-01-04', 17)
  , ('2016-01-05', 22)
;

create temporary table _weight (days_in_past int, weight int);
insert into _weight values
    (0, 4)
  , (1, 2)
  , (2, 1)
;

Then, if we want to calculate a moving average over a window of three days (including the current date) where values closer to the current date are assigned a higher weight than those further in the past, we'd expect for the weighted average for 2016-01-05 (based on values from 2016-01-05, 2016-01-04 and 2016-01-03):

(22*4 + 17*2 + 25*1) / (4+2+1) = 147 / 7 = 21

And the query could look as follows:

with _prepare_window as (
    select
        t1.ref_date
      , datediff(day, t2.ref_date, t1.ref_date) as days_in_past
      , t2.value * weight as weighted_value
      , weight
      , count(t2.ref_date) over(partition by t1.ref_date rows between unbounded preceding and unbounded following) as num_values_in_window
    from
        _data t1
    left join
        _data t2 on datediff(day, t2.ref_date, t1.ref_date) between 0 and 2
    left join
        _weight on datediff(day, t2.ref_date, t1.ref_date) = days_in_past
    order by
        t1.ref_date
      , datediff(day, t2.ref_date, t1.ref_date)
)
select
    ref_date
  , round(sum(weighted_value)::float/sum(weight), 0) as weighted_average
from
    _prepare_window
where
    num_values_in_window = 3
group by
    ref_date
order by
    ref_date
;

Giving the result:

  ref_date  | weighted_average
------------+------------------
 2016-01-03 |               23
 2016-01-04 |               19
 2016-01-05 |               21
(3 rows)
like image 143
moertel Avatar answered Dec 30 '22 08:12

moertel