Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove Rows That Sum Zero For A Given Key

Tags:

sql

sql-server

I have a query that will result in a customer bill being created on our SSRS 2008 R2 server. The SQL Server instance is also 2008 R2. The query is large and I don't want to post the entire thing for security reasons, etc.

What I need to do with the example data below, is to remove the two rows with 73.19 and -73.19 from the result set. So, if two rows have the same absolute value in the LineBalance column and their sum is 0 AND if they have the same value in the REF1 column, the should be removed from the result set. The line with REF1 = 14598 and a line balance of 281.47 should still be returned in the result set and the other two rows below with REF1 = 14598 should not be returned.

The point of this is to "hide" accounting errors and their correction from the customer. by "hide" I mean, not show it on the bill they get in the mail. What happened here is the customer was mistakenly billed 73.19 when they should have been billed 281.47. So, our AR dept. returned 73.19 to their account and charged them the correct amount of 281.47. As you can see they all have the same REF1 value.

An Example Of My Data

like image 343
Brent Lamborn Avatar asked Jan 07 '15 22:01

Brent Lamborn


People also ask

How do you remove a row that has any zero values?

Delete row if cell contains zero with Filter function in Excel. You can use the Filter function to filter out all rows based on the zero values in a certain column, and then delete all visible rows later.

How do I remove zero value rows in Excel?

Select all the filtered rows: Press Ctrl + Home, then press the down-arrow key to go to the first data row, then press Ctrl + Shift + End. Right-click on any selected cell and choose "Delete row" from the context menu or just press Ctrl + - (minus sign). Click OK in the "Delete entire sheet row?" dialog box.


2 Answers

I would add a field that would contain explicit flag telling you that a certain charge was a mistake/reversal of a mistake and then it is trivial to filter out such rows. Doing it on the fly could make your reports rather slow.

But, to solve the given problem as is we can do like this. The solution assumes that SysInvNum is unique.

Create a table with sample data

DECLARE @T TABLE (SysInvNum int, REF1 int, LineBalance money);

INSERT INTO @T (SysInvNum, REF1, LineBalance) VALUES (3344299, 14602, 558.83);
INSERT INTO @T (SysInvNum, REF1, LineBalance) VALUES (3344298, 14598, 281.47);
INSERT INTO @T (SysInvNum, REF1, LineBalance) VALUES (3344297, 14602, -95.98);
INSERT INTO @T (SysInvNum, REF1, LineBalance) VALUES (3344296, 14598, -73.19);
INSERT INTO @T (SysInvNum, REF1, LineBalance) VALUES (3341758, 14598, 73.19);

INSERT INTO @T (SysInvNum, REF1, LineBalance) VALUES (11, 100, 50.00);
INSERT INTO @T (SysInvNum, REF1, LineBalance) VALUES (12, 100, -50.00);
INSERT INTO @T (SysInvNum, REF1, LineBalance) VALUES (13, 100, 50.00);

INSERT INTO @T (SysInvNum, REF1, LineBalance) VALUES (21, 200, -50.00);
INSERT INTO @T (SysInvNum, REF1, LineBalance) VALUES (22, 200, -50.00);
INSERT INTO @T (SysInvNum, REF1, LineBalance) VALUES (23, 200, 50.00);

INSERT INTO @T (SysInvNum, REF1, LineBalance) VALUES (31, 300, -50.00);
INSERT INTO @T (SysInvNum, REF1, LineBalance) VALUES (32, 300, 50.00);
INSERT INTO @T (SysInvNum, REF1, LineBalance) VALUES (33, 300, -50.00);
INSERT INTO @T (SysInvNum, REF1, LineBalance) VALUES (34, 300, 50.00);

INSERT INTO @T (SysInvNum, REF1, LineBalance) VALUES (41, 400, 50.00);
INSERT INTO @T (SysInvNum, REF1, LineBalance) VALUES (42, 400, -50.00);
INSERT INTO @T (SysInvNum, REF1, LineBalance) VALUES (43, 400, 50.00);
INSERT INTO @T (SysInvNum, REF1, LineBalance) VALUES (44, 400, -50.00);
INSERT INTO @T (SysInvNum, REF1, LineBalance) VALUES (45, 400, 50.00);

I've added few more cases that have multiple mistakes.

Number and count rows

SELECT
    SysInvNum
    , REF1
    , LineBalance
    , ROW_NUMBER() OVER(PARTITION BY REF1, LineBalance ORDER BY SysInvNum) AS rn
    , COUNT(*) OVER(PARTITION BY REF1, ABS(LineBalance)) AS cc1
FROM @T AS TT

This is the result set:

SysInvNum    REF1    LineBalance    rn    cc1
11           100      50.00         1     3
12           100     -50.00         1     3
13           100      50.00         2     3
21           200     -50.00         1     3
23           200      50.00         1     3
22           200     -50.00         2     3
31           300     -50.00         1     4
32           300      50.00         1     4
33           300     -50.00         2     4
34           300      50.00         2     4
41           400      50.00         1     5
42           400     -50.00         1     5
43           400      50.00         2     5
44           400     -50.00         2     5
45           400      50.00         3     5
3341758      14598    73.19         1     2
3344296      14598   -73.19         1     2
3344298      14598   281.47         1     1
3344297      14602   -95.98         1     1
3344299      14602   558.83         1     1

You can see that those rows that have mistakes have count > 1. Also, pairs of mistakes have same row numbers. So, we need to remove/hide those rows that have count > 1 and those that have two same row numbers.

Determine rows to remove

WITH
CTE_rn
AS
(
    SELECT
        SysInvNum
        , REF1
        , LineBalance
        , ROW_NUMBER() OVER(PARTITION BY REF1, LineBalance ORDER BY SysInvNum) AS rn
        , COUNT(*) OVER(PARTITION BY REF1, ABS(LineBalance)) AS cc1
    FROM @T AS TT
)
, CTE_ToRemove
AS
(
    SELECT
        SysInvNum
        , REF1
        , LineBalance
        , COUNT(*) OVER(PARTITION BY REF1, rn) AS cc2
    FROM CTE_rn
    WHERE CTE_rn.cc1 > 1
)
SELECT *
FROM CTE_ToRemove
WHERE CTE_ToRemove.cc2 = 2

This is another intermediate result:

SysInvNum    REF1    LineBalance    cc2
12           100     -50.00         2
11           100      50.00         2
21           200     -50.00         2
23           200      50.00         2
32           300      50.00         2
31           300     -50.00         2
33           300     -50.00         2
34           300      50.00         2
42           400     -50.00         2
41           400      50.00         2
43           400      50.00         2
44           400     -50.00         2
3344296      14598   -73.19         2
3341758      14598    73.19         2

Now, we just put all this together.

Final query

WITH
CTE_rn
AS
(
    SELECT
        SysInvNum
        , REF1
        , LineBalance
        , ROW_NUMBER() OVER(PARTITION BY REF1, LineBalance ORDER BY SysInvNum) AS rn
        , COUNT(*) OVER(PARTITION BY REF1, ABS(LineBalance)) AS cc1
    FROM @T AS TT
)
, CTE_ToRemove
AS
(
    SELECT
        SysInvNum
        , REF1
        , LineBalance
        , COUNT(*) OVER(PARTITION BY REF1, rn) AS cc2
    FROM CTE_rn
    WHERE CTE_rn.cc1 > 1
)
SELECT *
FROM @T AS TT
WHERE
    TT.SysInvNum NOT IN 
    (
        SELECT CTE_ToRemove.SysInvNum
        FROM CTE_ToRemove
        WHERE CTE_ToRemove.cc2 = 2
    )
ORDER BY SysInvNum;

Result:

SysInvNum    REF1    LineBalance
13           100       50.00
22           200      -50.00
45           400       50.00
3344297      14602    -95.98
3344298      14598    281.47
3344299      14602    558.83

Note, that final result doesn't have any rows with REF = 300, because there were two corrected mistakes, that balanced each other completely.

like image 58
Vladimir Baranov Avatar answered Oct 05 '22 17:10

Vladimir Baranov


Most AR/billing systems treat "credit memos" (the negative amount) similar to cash, in which case the -73.19 would get applied to the 73.19 LineBalance the same as if the customer had paid that amount, resulting in a $0 balance.

OPTION 1:

Do you handle cash receipts and applications in this sytem? If so, you may be able to pull data from those cash application tables to show the tie between SysInvNum 3344296 and 3341758.

OPTION 2:

I'm assuming that PayAdjust column is used to reduce the balance after a customer has paid, and that LineBalance is a calculated column that is Charges + PayAdjust.

Most of the time when this occurs, the AR department would be responsible for applying the credit memo to an open invoice, so that the PayAdjust column would net $0 between the 2 rows, and this would cause the LineBalance to also be $0 on each of the 2 rows. It may just be a training issue for the system that is being used.

This would cause the 3 rows in question to look like this, so you don't have an issue, you would just exclude the rows by adding where LineBalance <> 0 to your query since the AR department (which applied the credit to begin with and so knows the answer to this question) explicitly stated which LineBalance the credit applies to:

Option 2 Preferred data structure:

SysInvNum   REF1        Charges               PayAdjust             LineBalance
----------- ----------- --------------------- --------------------- ---------------------
3344298     14598       281.47                0.00                  281.47
3344296     14598       -73.19                73.19                 0.00
3341758     14598       73.19                 -73.19                0.00

OPTION 3:

Without having this data from Option 1 or 2, you have make many assumptions and run the risk of inadvertently hiding the wrong rows.

That being said, here is a query that attempts to do what you are asking, but I would highly recommend checking with the AR dept to see if they can update "PayAdjust" for these records instead.

I added several test cases of scenarios that could cause issues, but this may not cover all the bases.

This query will only hide rows where one distinct matching negative value is found for a positive value, for the same REF1 and same DueDate. It also makes sure that the original Charge invoice ID is prior to the credit since it can be assumed that a credit would not occur prior to the actual charge (Test case 6 shows both rows still because the credit has an SysInvNum that occurred before the charge). If more than once match is found per REF1, DueDate, and LineBalance, then it will not hide the corresponding charge and credit lines (test cases 2 & 4). Test case 3 sums in total to 0, but it still shows all 3 rows because the LineBalance values do not match exactly. These are all assumptions that I made to handle edge cases, so they can be adjusted as needed.

CREATE TABLE #SysInvTable (SysInvNum int not null primary key, REF1 int, Charges money, PayAdjust money, LineBalance as Charges + PayAdjust, DueDate date, REF2 int, Remark varchar(50), REM varchar(50));

INSERT INTO #SysInvTable(SysInvNum, REF1, Charges, PayAdjust, DueDate, Remark) 
VALUES
    --.....................................
    --Your test case
      (3344298, 14598, 281.47, 0, '2014-12-08','Your original test case. This one should stay.')
    , (3344296, 14598, -73.19, 0, '2014-12-08',null)
    , (3341758, 14598, 73.19, 0, '2014-12-08',null)
    --.....................................
    --Test case 2: How do you match these up? 
    , (2001, 2, 73.19, 0, '2015-01-06','Charge 2.1')
    , (2002, 2, 73.19, 0, '2015-01-06','Charge 2.2')
    , (2003, 2, 73.19, 0, '2015-01-06','Charge 2.3')
    , (2004, 2, -73.19, 0, '2015-01-06','Credit for charge 2.3')
    , (2005, 2, -73.19, 0, '2015-01-06','Credit for charge 2.1') 
    --.....................................
    --Test case 3
    , (3001, 3, 73.19, 0, '2015-01-06','Charge 3.1')
    , (3002, 3, 73.19, 0, '2015-01-06','Charge 3.2')
    , (3003, 3, -146.38, 0, '2015-01-06','Credit for charges 3.1 and 3.2') 
    --.....................................
    --Test case 4: Do you hide 4001 or 4002? 
    , (4001, 4, 73.19, 0, '2015-01-06','Cable')
    , (4002, 4, 73.19, 0, '2015-01-06','Internet')
    , (4003, 4, -73.19, 0, '2015-01-06','Misc Credit')
    --.....................................
    --Test case 5: remove all lines except the first
    , (5000, 5, 9.99, 0, '2015-01-06','Charge 5.0 (Should stay)')
    , (5001, 5, 11.11, 0, '2015-01-06','Charge 5.1')
    , (5002, 5, 22.22, 0, '2015-01-06','Charge 5.2')
    , (5003, 5, 33.33, 0, '2015-01-06','Charge 5.3')
    , (5004, 5, -11.11, 0, '2015-01-06','Credit for charge 5.1')
    , (5005, 5, -33.33, 0, '2015-01-06','Credit for charge 5.3') 
    , (5006, 5, -22.22, 0, '2015-01-06','Credit for charge 5.2') 
    --.....................................
    --Test case 6: credit occurs before charge, so keep both
    , (6000, 6, -73.19, 0, '2015-01-06','Credit occurs before charge')
    , (6001, 6, 73.19, 0, '2015-01-06','Charge 6.1')
;

SELECT i.* 
FROM #SysInvTable i
WHERE i.SysInvNum not in 
(
    SELECT IngoreInvNum = case when c.N = 1 then max(t.SysInvNum) else min(t2.SysInvNum) end
    FROM #SysInvTable t
    INNER JOIN #SysInvTable t2 
        ON t.ref1 = t2.ref1 
        AND t.DueDate = t2.DueDate
    CROSS APPLY (SELECT 1 AS N UNION ALL SELECT 2 as N) AS c --used to both both T and T2 SysInvNum's to exclude
    WHERE 1=1
        AND t.LineBalance > 0 AND t2.LineBalance < 0
        AND t.SysInvNum < t2.SysInvNum --make sure the credit came in after the positive SysInvNum 
        AND t.LineBalance = t2.LineBalance * -1
    GROUP BY t.REF1, t.DueDate, abs(t.LineBalance), c.n
    HAVING Count(*) = 1
)
;

DROP TABLE #SysInvTable;
like image 34
BateTech Avatar answered Oct 05 '22 17:10

BateTech