Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculating running sum starting x years before

I have a table with entity name, year and activity number as bellow. During some years there is not any activity.

name | year | act_num
-----+------+---------
aa   | 2000 |       2
aa   | 2001 |       6
aa   | 2002 |       9
aa   | 2003 |      15
aa   | 2005 |      17
b    | 2000 |       3
b    | 2002 |       4
b    | 2003 |       9
b    | 2005 |      12
b    | 2006 |       2

To create it on postgresql;

CREATE TABLE entity_year_activity (
name character varying(10),
year integer,
act_num integer
);

INSERT INTO entity_year_activity
VALUES
    ('aa', 2000, 2),
    ('aa', 2001, 6),
    ('aa', 2002, 9),
    ('aa', 2003, 15),
    ('aa', 2005, 17),
    ('b', 2000, 3),
    ('b', 2002, 4),
    ('b', 2003, 9),
    ('b', 2005, 12),
    ('b', 2006, 2);

I would like to have the total number of the past x years with the number of this year activities for each entity for every year as bellow.

As an example for x = three years.

name | year | act_num | total_3_years
-----+------+---------+---------------
aa   | 2000 |       2 |      2
aa   | 2001 |       6 |      8
aa   | 2002 |       9 |     17
aa   | 2003 |      15 |     30
aa   | 2004 |       0 |     24
aa   | 2005 |      17 |     32
b    | 2000 |       3 |      3
b    | 2001 |       0 |      3
b    | 2002 |       4 |      7
b    | 2003 |       9 |     13
b    | 2005 |      12 |     21
b    | 2006 |       2 |     14
like image 784
heimatlos Avatar asked Oct 25 '12 13:10

heimatlos


2 Answers

Here's an approach that uses the ability to use the sum aggregate as a window function with a range-based window frame - see SUM(...) OVER (PARTITION BY name ORDER BY year ROWS 2 PRECEDING) and window framing.

WITH name_years(gen_name, gen_year) AS (
  SELECT gen_name, s
  FROM generate_series(
    (SELECT min(year) FROM entity_year_activity),
    (SELECT max(year) FROM entity_year_activity)
  ) s CROSS JOIN (SELECT DISTINCT name FROM entity_year_activity) n(gen_name)
),
windowed_history(name, year,act_num,last3_actnum) AS (
  SELECT
    gen_name, gen_year, coalesce( act_num, 0),
    SUM(coalesce(act_num,0)) OVER (PARTITION BY gen_name ORDER BY gen_year ROWS 2 PRECEDING)
  FROM name_years 
  LEFT OUTER JOIN entity_year_activity ON (gen_name = name AND gen_year = year)
)
SELECT name, year, act_num, sum(last3_actnum) as total_3_years
FROM windowed_history
GROUP BY name, year, act_num
HAVING sum(last3_actnum) <> 0
ORDER BY name, year;

See SQLFiddle.

The need to generate entries for years that have no entry themselves complicates this query. I generate a table of all (name, year) pairs, then left outer join entity_year_activity on it before doing the window sum, so all years for all name sets are represented. That's why this is so complicated. Then I filter the aggregated result to exclude entries with zero in the sum.

like image 80
Craig Ringer Avatar answered Nov 15 '22 10:11

Craig Ringer


SQL Fiddle

select
    s.name,
    d "year",
    coalesce(act_num, 0) act_num,
    coalesce(act_num, 0)
    + lag(coalesce(act_num, 0), 1, 0) over(partition by s.name order by d)
    + lag(coalesce(act_num, 0), 2, 0) over(partition by s.name order by d)
    total_3_years
from
    entity_year_activity eya
    right join (
        generate_series(
            (select min("year") from entity_year_activity),
            (select max("year") from entity_year_activity)
        ) d cross join (
        select distinct name
        from entity_year_activity
        ) f
    ) s on s.name = eya.name and s.d = eya."year"
order by s.name, d
like image 45
Clodoaldo Neto Avatar answered Nov 15 '22 10:11

Clodoaldo Neto