I want to find the cumulative or running amount of field and insert it from staging to table. My staging structure is something like this:
ea_month id amount ea_year circle_id April 92570 1000 2014 1 April 92571 3000 2014 2 April 92572 2000 2014 3 March 92573 3000 2014 1 March 92574 2500 2014 2 March 92575 3750 2014 3 February 92576 2000 2014 1 February 92577 2500 2014 2 February 92578 1450 2014 3
I want my target table to look something like this:
ea_month id amount ea_year circle_id cum_amt February 92576 1000 2014 1 1000 March 92573 3000 2014 1 4000 April 92570 2000 2014 1 6000 February 92577 3000 2014 2 3000 March 92574 2500 2014 2 5500 April 92571 3750 2014 2 9250 February 92578 2000 2014 3 2000 March 92575 2500 2014 3 4500 April 92572 1450 2014 3 5950
I am really very much confused with how to go about achieving this result. I want to achieve this result using PostgreSQL.
Can anyone suggest how to go about achieving this result-set?
A Cumulative total or running total refers to the sum of values in all cells of a column that precedes the next cell in that particular column. As you can see the below screenshot which displays a cumulative total in column RUNNING TOTAL for column Value .
Cumulative sums, or running totals, are used to display the total sum of data as it grows with time (or any other series or progression). This lets you view the total contribution so far of a given measure against time.
You can make use of the Hadoop Hive Analytic functions to calculate the cumulative sum or running sum and cumulative average. Sum and Average analytical functions are used along with window options to calculate the Hadoop Hive Cumulative Sum or running sum.
Basically, you need a window function. That's a standard feature nowadays. In addition to genuine window functions, you can use any aggregate function as window function in Postgres by appending an OVER
clause.
The special difficulty here is to get partitions and sort order right:
SELECT ea_month, id, amount, ea_year, circle_id , sum(amount) OVER (PARTITION BY circle_id ORDER BY ea_year, ea_month) AS cum_amt FROM tbl ORDER BY circle_id, month;
And no GROUP BY
.
The sum for each row is calculated from the first row in the partition to the current row - or quoting the manual to be precise:
The default framing option is
RANGE UNBOUNDED PRECEDING
, which is the same asRANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
. WithORDER BY
, this sets the frame to be all rows from the partition start up through the current row's lastORDER BY
peer.
... which is the cumulative or running sum you are after. Bold emphasis mine.
Rows with the same (circle_id, ea_year, ea_month)
are "peers" in this query. All of those show the same running sum with all peers added to the sum. But I assume your table is UNIQUE
on (circle_id, ea_year, ea_month)
, then the sort order is deterministic and no row has peers.
Postgres 11 added tools to include / exclude peers with the new frame_exclusion
options. See:
Now, ORDER BY ... ea_month
won't work with strings for month names. Postgres would sort alphabetically according to the locale setting.
If you have actual date
values stored in your table you can sort properly. If not, I suggest to replace ea_year
and ea_month
with a single column mon
of type date
in your table.
Transform what you have with to_date()
:
to_date(ea_year || ea_month , 'YYYYMonth') AS mon
For display, you can get original strings with to_char()
:
to_char(mon, 'Month') AS ea_month to_char(mon, 'YYYY') AS ea_year
While stuck with the unfortunate design, this will work:
SELECT ea_month, id, amount, ea_year, circle_id , sum(amount) OVER (PARTITION BY circle_id ORDER BY mon) AS cum_amt FROM (SELECT *, to_date(ea_year || ea_month, 'YYYYMonth') AS mon FROM tbl) ORDER BY circle_id, mon;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With