I want to find the cumulative or running amount of field and insert it from staging to table. My staging structure is something like this: <pre class="prettyprint"><code>ea_month id amount ea_year circle_id April 92570 1000 2014 1 April 92571 3000 2014 2 April 92572 2000 2014 3 March 92573 3000 2014 1 March 92574 2500 2014 2 March 92575 3750 2014 3 February 92576 2000 2014 1 February 92577 2500 2014 2 February 92578 1450 2014 3 </code></pre> I want my target table to look something like this: <pre class="prettyprint"><code>ea_month id amount ea_year circle_id cum_amt February 92576 1000 2014 1 1000 March 92573 3000 2014 1 4000 April 92570 2000 2014 1 6000 February 92577 3000 2014 2 3000 March 92574 2500 2014 2 5500 April 92571 3750 2014 2 9250 February 92578 2000 2014 3 2000 March 92575 2500 2014 3 4500 April 92572 1450 2014 3 5950 </code></pre> I am really very much confused with how to go about achieving this result. I want to achieve this result using PostgreSQL. Can anyone suggest how to go about achieving this result-set?

Basically, you need a window function. That's a standard feature nowadays. In addition to genuine window functions, you can use any aggregate function as window function in Postgres by appending an <code>OVER</code> clause. The special difficulty here is to get partitions and sort order right: <pre class="prettyprint"><code>SELECT ea_month, id, amount, ea_year, circle_id , sum(amount) OVER (PARTITION BY circle_id ORDER BY ea_year, ea_month) AS cum_amt FROM tbl ORDER BY circle_id, month; </code></pre> And no <code>GROUP BY</code>. The sum for each row is calculated from the first row in the partition to the current row - or quoting the manual to be precise: <blockquote> The default framing option is <code>RANGE UNBOUNDED PRECEDING</code>, which is the same as <code>RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code>. With <code>ORDER BY</code>, this sets the frame to be all rows from the partition start up through the current row's last <code>ORDER BY</code> peer. </blockquote> ... which is the cumulative or running sum you are after. Bold emphasis mine. Rows with the same <code>(circle_id, ea_year, ea_month)</code> are "peers" in this query. All of those show the same running sum with all peers added to the sum. But I assume your table is <code>UNIQUE</code> on <code>(circle_id, ea_year, ea_month)</code>, then the sort order is deterministic and no row has peers. Postgres 11 added tools to include / exclude peers with the new <code>frame_exclusion</code> options. See: <ul> <li>Aggregating all values not in the same group</li> </ul> Now, <code>ORDER BY ... ea_month</code> won't work with strings for month names. Postgres would sort alphabetically according to the locale setting. If you have actual <code>date</code> values stored in your table you can sort properly. If not, I suggest to replace <code>ea_year</code> and <code>ea_month</code> with a single column <code>mon</code> of type <code>date</code> in your table. <ul> <li> Transform what you have with <code>to_date()</code>: <pre class="prettyprint"><code> to_date(ea_year || ea_month , 'YYYYMonth') AS mon </code></pre> </li> <li> For display, you can get original strings with <code>to_char()</code>: <pre class="prettyprint"><code> to_char(mon, 'Month') AS ea_month to_char(mon, 'YYYY') AS ea_year </code></pre> </li> </ul> While stuck with the unfortunate design, this will work: <pre class="prettyprint"><code>SELECT ea_month, id, amount, ea_year, circle_id , sum(amount) OVER (PARTITION BY circle_id ORDER BY mon) AS cum_amt FROM (SELECT *, to_date(ea_year || ea_month, 'YYYYMonth') AS mon FROM tbl) ORDER BY circle_id, mon; </code></pre>

Calculating Cumulative Sum in PostgreSQL

Tags:

sql

postgresql

window-functions

cumulative-sum

analytic-functions

I want to find the cumulative or running amount of field and insert it from staging to table. My staging structure is something like this:

ea_month    id       amount    ea_year    circle_id April       92570    1000      2014        1 April       92571    3000      2014        2 April       92572    2000      2014        3 March       92573    3000      2014        1 March       92574    2500      2014        2 March       92575    3750      2014        3 February    92576    2000      2014        1 February    92577    2500      2014        2 February    92578    1450      2014        3

I want my target table to look something like this:

ea_month    id       amount    ea_year    circle_id    cum_amt February    92576    1000      2014        1           1000  March       92573    3000      2014        1           4000 April       92570    2000      2014        1           6000 February    92577    3000      2014        2           3000 March       92574    2500      2014        2           5500 April       92571    3750      2014        2           9250 February    92578    2000      2014        3           2000 March       92575    2500      2014        3           4500 April       92572    1450      2014        3           5950

I am really very much confused with how to go about achieving this result. I want to achieve this result using PostgreSQL.

Can anyone suggest how to go about achieving this result-set?

877

asked Apr 03 '14 14:04

Yousuf Sultan

1 Answers

Basically, you need a window function. That's a standard feature nowadays. In addition to genuine window functions, you can use any aggregate function as window function in Postgres by appending an OVER clause.

The special difficulty here is to get partitions and sort order right:

SELECT ea_month, id, amount, ea_year, circle_id      , sum(amount) OVER (PARTITION BY circle_id                          ORDER BY ea_year, ea_month) AS cum_amt FROM   tbl ORDER  BY circle_id, month;

And no GROUP BY.

The sum for each row is calculated from the first row in the partition to the current row - or quoting the manual to be precise:

The default framing option is RANGE UNBOUNDED PRECEDING, which is the same as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. With ORDER BY, this sets the frame to be all rows from the partition start up through the current row's last ORDER BY peer.

... which is the cumulative or running sum you are after. Bold emphasis mine.

Rows with the same (circle_id, ea_year, ea_month) are "peers" in this query. All of those show the same running sum with all peers added to the sum. But I assume your table is UNIQUE on (circle_id, ea_year, ea_month), then the sort order is deterministic and no row has peers.

Postgres 11 added tools to include / exclude peers with the new frame_exclusion options. See:

Aggregating all values not in the same group

Now, ORDER BY ... ea_month won't work with strings for month names. Postgres would sort alphabetically according to the locale setting.

If you have actual date values stored in your table you can sort properly. If not, I suggest to replace ea_year and ea_month with a single column mon of type date in your table.

Transform what you have with to_date():

  to_date(ea_year || ea_month , 'YYYYMonth') AS mon

For display, you can get original strings with to_char():

  to_char(mon, 'Month') AS ea_month   to_char(mon, 'YYYY') AS ea_year

While stuck with the unfortunate design, this will work:

SELECT ea_month, id, amount, ea_year, circle_id      , sum(amount) OVER (PARTITION BY circle_id ORDER BY mon) AS cum_amt FROM   (SELECT *, to_date(ea_year || ea_month, 'YYYYMonth') AS mon FROM tbl) ORDER  BY circle_id, mon;

141

answered Sep 20 '22 05:09

Erwin Brandstetter

Related questions
                            
                                unresolved reference to object [INFORMATION_SCHEMA].[TABLES]
                            
                                1052: Column 'id' in field list is ambiguous
                            
                                Return rows in random order [duplicate]
                            
                                Division ( / ) not giving my answer in postgresql
                            
                                What are projection and selection?
                            
                                How to test an SQL Update statement before running it?
                            
                                Server returns invalid timezone. Go to Advanced tab and set servertimezone property manually
                            
                                Sort NULL values to the end of a table
                            
                                How to kill/stop a long SQL query immediately?
                            
                                Naming of ID columns in database tables
                            
                                Is the LIKE operator case-sensitive with MSSQL Server?
                            
                                How to force a SQL Server 2008 database to go Offline
                            
                                SQL function as default parameter value?
                            
                                SQL Server: Get data for only the past year
                            
                                Grouping into interval of 5 minutes within a time range
                            
                                Checking for empty or null JToken in a JObject
                            
                                Search of table names
                            
                                Backup a single table with its data from a database in sql server 2008
                            
                                Are Stored Procedures more efficient, in general, than inline statements on modern RDBMS's? [duplicate]
                            
                                Foreign keys in mongo?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With