Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimizing a SQL query to remove a cursor

I am trying to write a query that will go through a table and apply any credits that are on an account to the oldest balance. I could not figure out a way to do this without using a cursor, and I know cursors should be avoided at all costs if possible, so I am coming here for help.

select * into #balances from [IDAT_AR_BALANCES] where amount > 0
select * into #credits from [IDAT_AR_BALANCES] where amount < 0

create index ba_ID on #balances (CLIENT_ID)
create index cr_ID on #credits (CLIENT_ID)

declare credit_cursor cursor for
select [CLIENT_ID], amount, cvtGUID from #credits

open credit_cursor
declare @client_id varchar(11)
declare @credit money
declare @balance money
declare @cvtGuidBalance uniqueidentifier
declare @cvtGuidCredit uniqueidentifier
fetch next from credit_cursor into @client_id, @credit, @cvtGuidCredit
while @@fetch_status = 0
begin
      while(@credit < 0 and (select count(*) from #balances where @client_id = CLIENT_ID and amount <> 0) > 0)
      begin
            select top 1  @balance = amount, @cvtGuidBalance = cvtGuid from #balances where @client_id = CLIENT_ID and amount <> 0 order by AGING_DATE
            set @credit = @balance + @credit
            if(@credit > 0)
            begin
                  update #balances set amount = @credit where cvtGuid = @cvtGuidBalance
                  set @credit = 0
            end
            else
            begin
                  update #balances set amount = 0 where cvtGuid = @cvtGuidBalance
            end
      end
      update #credits set amount = @credit where cvtGuid = @cvtGuidCredit
      fetch next from credit_cursor into @client_id, @credit, @cvtGuidCredit
end

close credit_cursor
deallocate credit_cursor

delete #balances where AMOUNT = 0
delete #credits where AMOUNT = 0

truncate table [IDAT_AR_BALANCES]

insert [IDAT_AR_BALANCES] select * from #balances
insert [IDAT_AR_BALANCES] select * from #credits

drop table #balances
drop table #credits

In my test cases for 10000 records and 1000 clients it take 26 seconds to run, by adding the two indexes on CLIENT_ID I was able to bring down the number to 14 seconds. However this is still too slow for what I need, the final result could have as many as 10000 clients and over 4,000,000 records so the run time could easily become in the double digit minutes.

Any recommendations on how I can restructure this to remove the cursor would be greatly appreciated.

Example (updated to show that you could have multiple credits after it has been run):

before
cvtGuid      client_id      ammount     AGING_DATE
xxxxxx       1              20.00       1/1/2011
xxxxxx       1              30.00       1/2/2011
xxxxxx       1              -10.00      1/3/2011
xxxxxx       1              5.00        1/4/2011
xxxxxx       2              20.00       1/1/2011
xxxxxx       2              15.00       1/2/2011
xxxxxx       2              -40.00      1/3/2011
xxxxxx       2              5.00        1/4/2011
xxxxxx       3              10.00       1/1/2011
xxxxxx       3              -20.00      1/2/2011
xxxxxx       3              5.00        1/3/2011
xxxxxx       3              -8.00       1/4/2011

after
cvtGuid      client_id      ammount     AGING_DATE
xxxxxx       1              10.00       1/1/2011
xxxxxx       1              30.00       1/2/2011
xxxxxx       1              5.00        1/4/2011
xxxxxx       3              -5.00       1/2/2011
xxxxxx       3              -8.00       1/4/2011

so it will apply the negative credit to the oldest positive balance (client 1 in the example), if there are no remaining positive balances after it is done it leaves the remaining negative (client 3), if they perfectly cancel out (this is the case 90% of the time with the real data) it will totally remove the record (client 2).

like image 341
Scott Chamberlain Avatar asked Apr 27 '11 19:04

Scott Chamberlain


1 Answers

It is possible to solve this with the help of a recursive CTE.

The basic idea is this:

  1. Get the totals of positive and negative values separately for every account (client_id).

  2. Iterate over every account and 'pinch off' an amount of one of the two totals, depending on amount's sign and absolute value (i.e. never 'pinch off' of the corresponding total more than its current value). The same value should be added to/subtracted from amount.

  3. After update, delete those rows where amount has become 0.

For my solution I've borrowed Lieven's table variable definition (thank you!), adding one column (cvtGuid, declared as int for the purpose of the demonstration) and one row (the last one from the original example, which was missing from Lieven's script).

/* preparing the demonstration data */
DECLARE @IDAT_AR_BALANCES TABLE (
  cvtGuid int IDENTITY,
  client_id INTEGER
  , amount FLOAT
  , date DATE
);
INSERT INTO @IDAT_AR_BALANCES
  SELECT 1, 20.00, '1/1/2011'
  UNION ALL SELECT 1, 30.00, '1/2/2011'
  UNION ALL SELECT 1, -10.00, '1/3/2011'
  UNION ALL SELECT 1, 5.00, '1/4/2011'
  UNION ALL SELECT 2, 20.00, '1/1/2011'
  UNION ALL SELECT 2, 15.00, '1/2/2011'
  UNION ALL SELECT 2, -40.00, '1/3/2011'
  UNION ALL SELECT 2, 5.00, '1/4/2011'
  UNION ALL SELECT 3, 10.00, '1/1/2011'
  UNION ALL SELECT 3, -20.00, '1/2/2011'
  UNION ALL SELECT 3, 5.00, '1/3/2011'
  UNION ALL SELECT 3, -8.00, '1/4/2011';

/* checking the original contents */
SELECT * FROM @IDAT_AR_BALANCES;

/* getting on with the job: */
WITH totals AS (
  SELECT
    /* 1) preparing the totals */
    client_id,
    total_pos = SUM(CASE WHEN amount > 0 THEN amount END),
    total_neg = SUM(CASE WHEN amount < 0 THEN amount END)
  FROM @IDAT_AR_BALANCES
  GROUP BY client_id
),
refined AS (
  /* 2) refining the original data with auxiliary columns:
     * rownum - row numbers (unique within accounts);
     * amount_to_discard_pos - the amount to discard `amount` completely if it's negative;
     * amount_to_discard_neg - the amount to discard `amount` completely if it's positive
  */
  SELECT
    *,
    rownum = ROW_NUMBER() OVER (PARTITION BY client_id ORDER BY date),
    amount_to_discard_pos = CAST(CASE WHEN amount < 0 THEN -amount ELSE 0 END AS float),
    amount_to_discard_neg = CAST(CASE WHEN amount > 0 THEN -amount ELSE 0 END AS float)
  FROM @IDAT_AR_BALANCES
),
prepared AS (
  /* 3) preparing the final table (using a recursive CTE) */
  SELECT
    cvtGuid = CAST(NULL AS int),
    client_id,
    amount = CAST(NULL AS float),
    date = CAST(NULL AS date),
    amount_update = CAST(NULL AS float),
    running_balance_pos = total_pos,
    running_balance_neg = total_neg,
    rownum = CAST(0 AS bigint)
  FROM totals
  UNION ALL
  SELECT
    n.cvtGuid,
    n.client_id,
    n.amount,
    n.date,
    amount_update = CAST(
      CASE
        WHEN n.amount_to_discard_pos < p.running_balance_pos
        THEN n.amount_to_discard_pos
        ELSE p.running_balance_pos
      END
      +
      CASE
        WHEN n.amount_to_discard_neg > p.running_balance_neg
        THEN n.amount_to_discard_neg
        ELSE p.running_balance_neg
      END
    AS float),
    running_balance_pos = CAST(p.running_balance_pos -
      CASE
        WHEN n.amount_to_discard_pos < p.running_balance_pos
        THEN n.amount_to_discard_pos
        ELSE p.running_balance_pos
      END
    AS float),
    running_balance_neg = CAST(p.running_balance_neg -
      CASE
        WHEN n.amount_to_discard_neg > p.running_balance_neg
        THEN n.amount_to_discard_neg
        ELSE p.running_balance_neg
      END
    AS float),
    n.rownum
  FROM refined n
    INNER JOIN prepared p ON n.client_id = p.client_id AND n.rownum = p.rownum + 1
)
/*                  -- some junk that I've forgotten to clean up,
SELECT *            -- which you might actually want to use
FROM prepared       -- to view the final prepared result set
WHERE rownum > 0    -- before actually running the update
ORDER BY client_id, rownum
*/
/* performing the update */
UPDATE t
SET amount = t.amount + u.amount_update
FROM @IDAT_AR_BALANCES t INNER JOIN prepared u ON t.cvtGuid = u.cvtGuid
OPTION (MAXRECURSION 0);

/* checking the contents after UPDATE */
SELECT * FROM @IDAT_AR_BALANCES;

/* deleting the eliminated amounts */
DELETE FROM @IDAT_AR_BALANCES WHERE amount = 0;

/* checking the contents after DELETE */
SELECT * FROM @IDAT_AR_BALANCES;

UPDATE

As Lieven has correctly suggested (thank you again!), you can delete all the rows from the accounts where amount adds up to 0 first, then update the other rows. That will increase the overall performance, since, as you say, the majority of the data have their amounts adding up to 0.

Here's a variation on Lieven's solution for deleting 'zero accounts':

DELETE FROM @IDAT_AR_BALANCES
WHERE client_id IN (
  SELECT client_id
  FROM @IDAT_AR_BALANCES
  GROUP BY client_id
  HAVING SUM(amount) = 0
)

Keep in mind, though, that the DELETE after the update will still be needed too, because the update may reset some of the amount values to 0. If I were you, I might consider creating a trigger FOR UPDATE that would automatically delete the rows where amount = 0. Such a solution is not always acceptable, but sometimes is fine. It depends on what else you can possibly do with your data. It may also depend on whether it's solely your project or there are other maintainers as well (who don't like rows 'magically' and unexpectedly disappearing).

like image 102
Andriy M Avatar answered Oct 05 '22 14:10

Andriy M