Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Need some tSQL Wizardry: SQL Update Based on Running Total

I've got an implementation for this that uses a super hairy recursive CTE that is really hard to follow/maintain. I was hoping that one of the brains on SO could come up with some more straightforward tSQL approach code to accomplish the following:

Table Documents

DocID    SortOrder    PageCount    StartPgNum   EndPgNum
5        1            2               {1}          {2}
8        2            7               {3}          {9}
22       3            3               {10}         {12}

For the table given above, I need a query to populate StartPgNum and EndPgNum (Sample values included in the example in {} to make the intentions clearer for what I need.

Assumptions:
* DocID, SortOrder, and PageCount are pre-populated.
* StartPgNum and EndgNum need to be populated by tSQL code.
* SortOrder always starts at 1, and is continuous with no gaps.
* Documents should get a continuous page numbering scheme as ordered by SortOrder

like image 388
JohnFx Avatar asked Mar 04 '10 00:03

JohnFx


2 Answers

Updated to be better :)

DECLARE @temp TABLE (DocID INT, SortOrder INT, PageCount INT)

INSERT INTO @temp VALUES (5, 1, 2)
INSERT INTO @temp VALUES (8, 2, 7)
INSERT INTO @temp VALUES (22, 3, 3)

SELECT
    *,
    StartPgNum + PageCount-1 AS EndPgNum
FROM
(SELECT
    DocID,
    SortOrder,
    PageCount,
    ISNULL((SELECT SUM(PageCount)+1 FROM @temp WHERE SortOrder < parent.SortOrder), 1) AS StartPgNum
FROM
    @temp parent) _temp
like image 172
Timothy Khouri Avatar answered Sep 30 '22 16:09

Timothy Khouri


I did some testing on all of the solutions provided here in the other answers, my original "Hairy Recursive CTE" option and for the sake of completeness a simple cursor based approach. To my great surprise the cursor option performed the best by a clear margin in all my tests (1K Rows, 10KRows, 50K Rows, 500K Rows)

Here are the average times for each approach for 10K records:
Hairy Recursive CTE: 3 minutes 55 seconds
CROSS APPLY (Ben Dempsey): 21-25 seconds
SUBSELECTS (Tim Khouri): 19-21 seconds
CURSOR: 1-2 Seconds

Here is my cursor based solution:

Declare @temp TABLE(
 DocID INT PRIMARY KEY NOT NULL, 
 SortOrder INT NOT NULL, 
 PageCount INT NOT NULL,
 BegPg int,
 EndPg int
)

Insert into @temp (DocID,SortOrder,PageCount) 
SELECT top 50000 docid, ROW_NUMBER() OVER (ORDER BY DOCID),Pages FROM tblDocuments

DECLARE @PC int
SET @PC=1
DECLARE @FetchPageCount int
DECLARE @FetchDocID int

DECLARE myCursor CURSOR FOR 
SELECT DocID, PageCount FROM @temp ORDER BY SortOrder

OPEN myCursor
FETCH NEXT FROM myCursor INTO @FetchDocID,@FetchPageCount

WHILE @@FETCH_STATUS = 0
BEGIN

  UPDATE @temp SET BegPg=@PC, EndPg=@PC+ @FetchPageCount-1   
  WHERE (Docid=@fetchDocid)

     SET @PC = @PC + @FetchPageCount

    FETCH NEXT FROM myCursor INTO @FetchDocID,@FetchPageCount
END 
CLOSE myCursor
DEALLOCATE myCursor

SELECT * FROM @temp

Who would have guessed it? Maybe cursors aren't always evil.

A word of warning: Lest you be tempted to replace the update to the "WHERE CURRENT OF myCursor" syntax, it performed much slower than using the current version with a where clause, although still faster than most of the other approaches.

like image 38
JohnFx Avatar answered Sep 30 '22 15:09

JohnFx