<p>I have a table:</p> <pre class="prettyprint"><code>create table Transactions(Tid int,amt int) </code></pre> <p>With 5 rows:</p> <pre class="prettyprint"><code>insert into Transactions values(1, 100) insert into Transactions values(2, -50) insert into Transactions values(3, 100) insert into Transactions values(4, -100) insert into Transactions values(5, 200) </code></pre> <p>Desired output:</p> <pre class="prettyprint"><code>TID amt balance --- ----- ------- 1 100 100 2 -50 50 3 100 150 4 -100 50 5 200 250 </code></pre> <p>Basically for first record balance will be same as <code>amt</code>, 2nd onwards balance would be addition of previous balance + current <code>amt</code>. I am looking for an optimal approach. I could think about using function or correlated subquery but not sure exactly how to do it. </p>

<p>For those not using SQL Server 2012 or above, a cursor is likely the most efficient <em>supported</em> and <em>guaranteed</em> method outside of CLR. There are other approaches such as the "quirky update" which can be marginally faster but not guaranteed to work in the future, and of course set-based approaches with hyperbolic performance profiles as the table gets larger, and recursive CTE methods that often require direct #tempdb I/O or result in spills that yield roughly the same impact.</p> <hr> <h3>INNER JOIN - do not do this:</h3> <p>The slow, set-based approach is of the form:</p> <pre class="prettyprint"><code>SELECT t1.TID, t1.amt, RunningTotal = SUM(t2.amt) FROM dbo.Transactions AS t1 INNER JOIN dbo.Transactions AS t2 ON t1.TID >= t2.TID GROUP BY t1.TID, t1.amt ORDER BY t1.TID; </code></pre> <p>The reason this is slow? As the table gets larger, each incremental row requires reading n-1 rows in the table. This is exponential and bound for failures, timeouts, or just angry users.</p> <hr> <h3>Correlated subquery - do not do this either:</h3> <p>The subquery form is similarly painful for similarly painful reasons.</p> <pre class="prettyprint"><code>SELECT TID, amt, RunningTotal = amt + COALESCE( ( SELECT SUM(amt) FROM dbo.Transactions AS i WHERE i.TID < o.TID), 0 ) FROM dbo.Transactions AS o ORDER BY TID; </code></pre> <hr> <h3>Quirky update - do this at your own risk:</h3> <p>The "quirky update" method is more efficient than the above, but the behavior is not documented, there are no guarantees about order, and the behavior might work today but could break in the future. I'm including this because it is a popular method and it is efficient, but that doesn't mean I endorse it. The primary reason I even answered this question instead of closing it as a duplicate is because the other question has a quirky update as the accepted answer.</p> <pre class="prettyprint"><code>DECLARE @t TABLE ( TID INT PRIMARY KEY, amt INT, RunningTotal INT ); DECLARE @RunningTotal INT = 0; INSERT @t(TID, amt, RunningTotal) SELECT TID, amt, RunningTotal = 0 FROM dbo.Transactions ORDER BY TID; UPDATE @t SET @RunningTotal = RunningTotal = @RunningTotal + amt FROM @t; SELECT TID, amt, RunningTotal FROM @t ORDER BY TID; </code></pre> <hr> <h3>Recursive CTEs</h3> <p>This first one relies on TID to be contiguous, no gaps:</p> <pre class="prettyprint"><code>;WITH x AS ( SELECT TID, amt, RunningTotal = amt FROM dbo.Transactions WHERE TID = 1 UNION ALL SELECT y.TID, y.amt, x.RunningTotal + y.amt FROM x INNER JOIN dbo.Transactions AS y ON y.TID = x.TID + 1 ) SELECT TID, amt, RunningTotal FROM x ORDER BY TID OPTION (MAXRECURSION 10000); </code></pre> <p>If you can't rely on this, then you can use this variation, which simply builds a contiguous sequence using <code>ROW_NUMBER()</code>:</p> <pre class="prettyprint"><code>;WITH y AS ( SELECT TID, amt, rn = ROW_NUMBER() OVER (ORDER BY TID) FROM dbo.Transactions ), x AS ( SELECT TID, rn, amt, rt = amt FROM y WHERE rn = 1 UNION ALL SELECT y.TID, y.rn, y.amt, x.rt + y.amt FROM x INNER JOIN y ON y.rn = x.rn + 1 ) SELECT TID, amt, RunningTotal = rt FROM x ORDER BY x.rn OPTION (MAXRECURSION 10000); </code></pre> <p>Depending on the size of the data (e.g. columns we don't know about), you may find better overall performance by stuffing the relevant columns only in a #temp table first, and processing against that instead of the base table:</p> <pre class="prettyprint"><code>CREATE TABLE #x ( rn INT PRIMARY KEY, TID INT, amt INT ); INSERT INTO #x (rn, TID, amt) SELECT ROW_NUMBER() OVER (ORDER BY TID), TID, amt FROM dbo.Transactions; ;WITH x AS ( SELECT TID, rn, amt, rt = amt FROM #x WHERE rn = 1 UNION ALL SELECT y.TID, y.rn, y.amt, x.rt + y.amt FROM x INNER JOIN #x AS y ON y.rn = x.rn + 1 ) SELECT TID, amt, RunningTotal = rt FROM x ORDER BY TID OPTION (MAXRECURSION 10000); DROP TABLE #x; </code></pre> <p>Only the first CTE method will provide performance rivaling the quirky update, but it makes a big assumption about the nature of the data (no gaps). The other two methods will fall back and in those cases you may as well use a cursor (if you can't use CLR and you're not yet on SQL Server 2012 or above).</p> <hr> <h3>Cursor</h3> <p>Everybody is told that cursors are evil, and that they should be avoided at all costs, but this actually beats the performance of most other supported methods, and is safer than the quirky update. The only ones I prefer over the cursor solution are the 2012 and CLR methods (below):</p> <pre class="prettyprint"><code>CREATE TABLE #x ( TID INT PRIMARY KEY, amt INT, rt INT ); INSERT #x(TID, amt) SELECT TID, amt FROM dbo.Transactions ORDER BY TID; DECLARE @rt INT, @tid INT, @amt INT; SET @rt = 0; DECLARE c CURSOR LOCAL STATIC READ_ONLY FORWARD_ONLY FOR SELECT TID, amt FROM #x ORDER BY TID; OPEN c; FETCH c INTO @tid, @amt; WHILE @@FETCH_STATUS = 0 BEGIN SET @rt = @rt + @amt; UPDATE #x SET rt = @rt WHERE TID = @tid; FETCH c INTO @tid, @amt; END CLOSE c; DEALLOCATE c; SELECT TID, amt, RunningTotal = rt FROM #x ORDER BY TID; DROP TABLE #x; </code></pre> <hr> <h3>SQL Server 2012 or above</h3> <p>New window functions introduced in SQL Server 2012 make this task a lot easier (and it performs better than all of the above methods as well):</p> <pre class="prettyprint"><code>SELECT TID, amt, RunningTotal = SUM(amt) OVER (ORDER BY TID ROWS UNBOUNDED PRECEDING) FROM dbo.Transactions ORDER BY TID; </code></pre> <p>Note that on larger data sets, you'll find that the above performs much better than either of the following two options, since RANGE uses an on-disk spool (and the default uses RANGE). However it is also important to note that the behavior and results can differ, so be sure they both return correct results before deciding between them based on this difference.</p> <pre class="prettyprint"><code>SELECT TID, amt, RunningTotal = SUM(amt) OVER (ORDER BY TID) FROM dbo.Transactions ORDER BY TID; SELECT TID, amt, RunningTotal = SUM(amt) OVER (ORDER BY TID RANGE UNBOUNDED PRECEDING) FROM dbo.Transactions ORDER BY TID; </code></pre> <hr> <h3>CLR</h3> <p>For completeness, I'm offering a link to Pavel Pawlowski's CLR method, which is by far the preferable method on versions prior to SQL Server 2012 (but not 2000 obviously).</p> <p>http://www.pawlowski.cz/2010/09/sql-server-and-fastest-running-totals-using-clr/</p> <hr> <h3>Conclusion</h3> <p>If you are on SQL Server 2012 or above, the choice is obvious - use the new <code>SUM() OVER()</code> construct (with <code>ROWS</code> vs. <code>RANGE</code>). For earlier versions, you'll want to compare the performance of the alternative approaches on your schema, data and - taking non-performance-related factors in mind - determine which approach is right for you. It very well may be the CLR approach. Here are my recommendations, in order of preference:</p> <ol> <li> <code>SUM() OVER() ... ROWS</code>, if on 2012 or above</li> <li>CLR method, if possible</li> <li>First recursive CTE method, if possible</li> <li>Cursor</li> <li>The other recursive CTE methods</li> <li>Quirky update</li> <li>Join and/or correlated subquery</li> </ol> <hr> <p>For further information with performance comparisons of these methods, see this question on http://dba.stackexchange.com:</p> <p>https://dba.stackexchange.com/questions/19507/running-total-with-count</p> <hr> <p>I've also blogged more details about these comparisons here:</p> <p>http://www.sqlperformance.com/2012/07/t-sql-queries/running-totals</p> <hr> <p>Also for grouped/partitioned running totals, see the following posts:</p> <p>http://sqlperformance.com/2014/01/t-sql-queries/grouped-running-totals</p> <p>Partitioning results in a running totals query</p> <p>Multiple Running Totals with Group By</p>

<p>If you use version 2012, here is a solution </p> <pre class="prettyprint"><code>select *, sum(amt) over (order by Tid) as running_total from Transactions </code></pre> <p>For earlier versions</p> <pre class="prettyprint"><code>select *,(select sum(amt) from Transactions where Tid<=t.Tid) as running_total from Transactions as t </code></pre>

Calculate running total / running balance

Tags:

sql-server

tsql

cumulative-sum

I have a table:

create table Transactions(Tid int,amt int)

With 5 rows:

insert into Transactions values(1, 100) insert into Transactions values(2, -50) insert into Transactions values(3, 100) insert into Transactions values(4, -100) insert into Transactions values(5, 200)

Desired output:

TID  amt  balance --- ----- ------- 1    100   100 2    -50    50 3    100   150 4   -100    50 5    200   250

Basically for first record balance will be same as amt, 2nd onwards balance would be addition of previous balance + current amt. I am looking for an optimal approach. I could think about using function or correlated subquery but not sure exactly how to do it.

633

asked Jul 03 '12 12:07

Pritesh

2 Answers

For those not using SQL Server 2012 or above, a cursor is likely the most efficient supported and guaranteed method outside of CLR. There are other approaches such as the "quirky update" which can be marginally faster but not guaranteed to work in the future, and of course set-based approaches with hyperbolic performance profiles as the table gets larger, and recursive CTE methods that often require direct #tempdb I/O or result in spills that yield roughly the same impact.

INNER JOIN - do not do this:

The slow, set-based approach is of the form:

SELECT t1.TID, t1.amt, RunningTotal = SUM(t2.amt) FROM dbo.Transactions AS t1 INNER JOIN dbo.Transactions AS t2   ON t1.TID >= t2.TID GROUP BY t1.TID, t1.amt ORDER BY t1.TID;

The reason this is slow? As the table gets larger, each incremental row requires reading n-1 rows in the table. This is exponential and bound for failures, timeouts, or just angry users.

Correlated subquery - do not do this either:

The subquery form is similarly painful for similarly painful reasons.

SELECT TID, amt, RunningTotal = amt + COALESCE( (   SELECT SUM(amt)     FROM dbo.Transactions AS i     WHERE i.TID < o.TID), 0 ) FROM dbo.Transactions AS o ORDER BY TID;

Quirky update - do this at your own risk:

The "quirky update" method is more efficient than the above, but the behavior is not documented, there are no guarantees about order, and the behavior might work today but could break in the future. I'm including this because it is a popular method and it is efficient, but that doesn't mean I endorse it. The primary reason I even answered this question instead of closing it as a duplicate is because the other question has a quirky update as the accepted answer.

DECLARE @t TABLE (   TID INT PRIMARY KEY,   amt INT,   RunningTotal INT );   DECLARE @RunningTotal INT = 0;   INSERT @t(TID, amt, RunningTotal)   SELECT TID, amt, RunningTotal = 0   FROM dbo.Transactions   ORDER BY TID;   UPDATE @t   SET @RunningTotal = RunningTotal = @RunningTotal + amt   FROM @t;   SELECT TID, amt, RunningTotal   FROM @t   ORDER BY TID;

Recursive CTEs

This first one relies on TID to be contiguous, no gaps:

;WITH x AS (   SELECT TID, amt, RunningTotal = amt     FROM dbo.Transactions     WHERE TID = 1   UNION ALL   SELECT y.TID, y.amt, x.RunningTotal + y.amt    FROM x     INNER JOIN dbo.Transactions AS y    ON y.TID = x.TID + 1 ) SELECT TID, amt, RunningTotal   FROM x   ORDER BY TID   OPTION (MAXRECURSION 10000);

If you can't rely on this, then you can use this variation, which simply builds a contiguous sequence using ROW_NUMBER():

;WITH y AS  (   SELECT TID, amt, rn = ROW_NUMBER() OVER (ORDER BY TID)     FROM dbo.Transactions ), x AS (     SELECT TID, rn, amt, rt = amt       FROM y       WHERE rn = 1     UNION ALL     SELECT y.TID, y.rn, y.amt, x.rt + y.amt       FROM x INNER JOIN y       ON y.rn = x.rn + 1 ) SELECT TID, amt, RunningTotal = rt   FROM x   ORDER BY x.rn   OPTION (MAXRECURSION 10000);

Depending on the size of the data (e.g. columns we don't know about), you may find better overall performance by stuffing the relevant columns only in a #temp table first, and processing against that instead of the base table:

CREATE TABLE #x (   rn  INT PRIMARY KEY,   TID INT,   amt INT );  INSERT INTO #x (rn, TID, amt) SELECT ROW_NUMBER() OVER (ORDER BY TID),   TID, amt FROM dbo.Transactions;  ;WITH x AS (   SELECT TID, rn, amt, rt = amt     FROM #x     WHERE rn = 1   UNION ALL   SELECT y.TID, y.rn, y.amt, x.rt + y.amt     FROM x INNER JOIN #x AS y     ON y.rn = x.rn + 1 ) SELECT TID, amt, RunningTotal = rt   FROM x   ORDER BY TID   OPTION (MAXRECURSION 10000);  DROP TABLE #x;

Only the first CTE method will provide performance rivaling the quirky update, but it makes a big assumption about the nature of the data (no gaps). The other two methods will fall back and in those cases you may as well use a cursor (if you can't use CLR and you're not yet on SQL Server 2012 or above).

Cursor

Everybody is told that cursors are evil, and that they should be avoided at all costs, but this actually beats the performance of most other supported methods, and is safer than the quirky update. The only ones I prefer over the cursor solution are the 2012 and CLR methods (below):

CREATE TABLE #x (   TID INT PRIMARY KEY,    amt INT,    rt INT );  INSERT #x(TID, amt)    SELECT TID, amt   FROM dbo.Transactions   ORDER BY TID;  DECLARE @rt INT, @tid INT, @amt INT; SET @rt = 0;  DECLARE c CURSOR LOCAL STATIC READ_ONLY FORWARD_ONLY   FOR SELECT TID, amt FROM #x ORDER BY TID;  OPEN c;  FETCH c INTO @tid, @amt;  WHILE @@FETCH_STATUS = 0 BEGIN   SET @rt = @rt + @amt;   UPDATE #x SET rt = @rt WHERE TID = @tid;   FETCH c INTO @tid, @amt; END  CLOSE c; DEALLOCATE c;  SELECT TID, amt, RunningTotal = rt    FROM #x    ORDER BY TID;  DROP TABLE #x;

SQL Server 2012 or above

New window functions introduced in SQL Server 2012 make this task a lot easier (and it performs better than all of the above methods as well):

SELECT TID, amt,    RunningTotal = SUM(amt) OVER (ORDER BY TID ROWS UNBOUNDED PRECEDING) FROM dbo.Transactions ORDER BY TID;

Note that on larger data sets, you'll find that the above performs much better than either of the following two options, since RANGE uses an on-disk spool (and the default uses RANGE). However it is also important to note that the behavior and results can differ, so be sure they both return correct results before deciding between them based on this difference.

SELECT TID, amt,    RunningTotal = SUM(amt) OVER (ORDER BY TID) FROM dbo.Transactions ORDER BY TID;  SELECT TID, amt,    RunningTotal = SUM(amt) OVER (ORDER BY TID RANGE UNBOUNDED PRECEDING) FROM dbo.Transactions ORDER BY TID;

CLR

For completeness, I'm offering a link to Pavel Pawlowski's CLR method, which is by far the preferable method on versions prior to SQL Server 2012 (but not 2000 obviously).

http://www.pawlowski.cz/2010/09/sql-server-and-fastest-running-totals-using-clr/

Conclusion

If you are on SQL Server 2012 or above, the choice is obvious - use the new SUM() OVER() construct (with ROWS vs. RANGE). For earlier versions, you'll want to compare the performance of the alternative approaches on your schema, data and - taking non-performance-related factors in mind - determine which approach is right for you. It very well may be the CLR approach. Here are my recommendations, in order of preference:

SUM() OVER() ... ROWS, if on 2012 or above
CLR method, if possible
First recursive CTE method, if possible
Cursor
The other recursive CTE methods
Quirky update
Join and/or correlated subquery

For further information with performance comparisons of these methods, see this question on http://dba.stackexchange.com:

https://dba.stackexchange.com/questions/19507/running-total-with-count

I've also blogged more details about these comparisons here:

http://www.sqlperformance.com/2012/07/t-sql-queries/running-totals

Also for grouped/partitioned running totals, see the following posts:

http://sqlperformance.com/2014/01/t-sql-queries/grouped-running-totals

Partitioning results in a running totals query

Multiple Running Totals with Group By

answered Nov 16 '22 00:11

Aaron Bertrand

If you use version 2012, here is a solution

select *, sum(amt) over (order by Tid) as running_total from Transactions

For earlier versions

select *,(select sum(amt) from Transactions where Tid<=t.Tid) as running_total from Transactions as t

answered Nov 16 '22 02:11

Madhivanan

Related questions
                            
                                What does the SQL # symbol mean and how is it used?
                            
                                The order of a SQL Select statement without Order By clause
                            
                                "IN" clause limitation in Sql Server
                            
                                Python vs C#/.NET -- what are the key differences to consider for using one to develop a large web application?
                            
                                What do these Copy Only Backup options mean?
                            
                                Cannot create index on view 'View_Table_Name' because the view is not schema bound
                            
                                How to write "not in ()" sql query using join
                            
                                Error Invalid prefix or suffix characters in SQL Server Management Studio
                            
                                Get MAX value of a BIT column
                            
                                How to insert a data table into SQL Server database table?
                            
                                newid() inside sql server function
                            
                                Increment Row Number on Group
                            
                                What does =* mean?
                            
                                How to get number of rows inserted by a transaction
                            
                                Trouble Connecting to sql server Login failed. "The login is from an untrusted domain and cannot be used with Windows authentication"
                            
                                Find all stored procedures that reference a specific column in some table
                            
                                Database design for user settings
                            
                                How do I add a auto_increment primary key in SQL Server database?
                            
                                Does inserting data into SQL Server lock the whole table?
                            
                                Capturing count from an SQL query

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With