Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MERGing Duplicate Records in SQL Server

SQL Fiddle

I have the following table

CREATE TABLE __EpiTest
(
    `ActivityRecordID` int, 
    `ActCstID` varchar(6), 
    `ResCstID` varchar(6), 
    `VolAmt` int, 
    `ActCnt` int, 
    `TotOCst` int, 
    `TotCst` int
);

INSERT INTO __EpiTest (`ActivityRecordID`, `ActCstID`, `ResCstID`, `VolAmt`, `ActCnt`, `TotOCst`, `TotCst`)
VALUES (15652, 'DIM008', 'CPF005', 30.455249786377, 1, 0, 0.375024198767061),
       (15652, 'DIM008', 'CSC004', 30.455249786377, 1, 7.62176510799961, 11.932578069479),
       (15652, 'DIM008', 'REC001', 30.455249786377, 1, 0.17902367836393, 0.384881520159455),
       (15652, 'OUT001', 'CPF002', 15, 0, 0, 16.9408193013078),
       (15652, 'OUT001', 'CSC001', 15, 0, 2.36971564207042,  2.36971564207042),
       (15652, 'OUT001', 'CSC004', 15, 0, 12.3230666021278, 12.3760690367354),
       (15652, 'OUT001', 'REC001', 15, 0, 0.377459387378349, 3.0275278374102),
       (15652, 'SUP001', 'CPF002', 1, 1, 0, 0.00108648359810756),
       (15652, 'SUP001', 'CPF011', 1, 1, 0, -1.89799880202357E-14),
       (15652, 'SUP001', 'CPF020', 1, 1, 0, 1.31058251625567E-05),
       (15652, 'SUP001', 'CPF021', 1, 1, 0, 25.0942308512551),
       (15652, 'SUP001', 'CPF021', 1, 1, 0, 25.0942308512551),
       (15652, 'SUP001', 'CSC001', 1, 1, 1.9628769103451, 1.9628769103451),
       (15652, 'SUP001', 'CSC001', 1, 1, 1.9628769103451, 1.9628769103451),
       (15652, 'SUP001', 'CSC002', 1, 1, 0, 10.2266625467779),
       (15652, 'SUP001', 'CSC004', 1, 1, 16.3451721608005, 16.3513319060046),
       (15652, 'SUP001', 'CSC004', 1, 1, 16.3451721608005, 16.3513319060046),
       (15652, 'SUP001', 'REC001', 1, 1, 0.254410386701976, -6.27048795659376),
       (15652, 'SUP001', 'REC001', 1, 1, 0.254410386701976, -6.27048795659376),
       (15652, 'SUP001', 'REC002', 1, 1, 0, 1.10781732547441);

Notice there are rows which have matching values for [ActivityRecordID], [ActCstID] and [ResCstID]. I want to merge these values and sum the values in [TotOCst] and [TotCst]. To do this I have attempted to use MERGE

MERGE [__EpiTest] AS Tgt 
USING (
    SELECT [ActivityRecordID], 
           [ActCstID], 
           [ResCstID], 
           SUM([TotOCst]) AS TotOCst, 
           SUM([TotCst]) AS TotCst 
    FROM [__EpiTest]  
    GROUP BY [ActivityRecordID], 
             [ActCstID], 
             [ResCstID]) AS Src 
ON (Tgt.[ActivityRecordID] = Src.[ActivityRecordID] AND 
     Tgt.[ActCstID] = Src.[ActCstID] AND 
     Tgt.[ResCstID] = Src.[ResCstID])  
WHEN MATCHED THEN 
    UPDATE 
    SET [TotOCst] = Src.[TotOCst], 
        [TotCst] = Src.[TotCst] 
WHEN NOT MATCHED BY SOURCE THEN 
    DELETE;
GO

This matches and correctly updates the values in the [TotOCst] and [TotCst] for each duplicate, but it then leaves the duplicate rows in the table whereas I want all but one removed. How can I achieve this?

Note, the target table is HUGE, so I would like to try and do this with a single operation by using a variation of the MERGE query above, or other alternative. Multiple queries will be too expensive for me to deal with...


Illustration

I get

...
15652   SUP001  CPF021  1   1   0                   12.5471154256275
15652   SUP001  CPF021  1   1   0                   12.5471154256275
15652   SUP001  CSC001  1   1   0.98143845517255    0.98143845517255
15652   SUP001  CSC001  1   1   0.98143845517255    0.98143845517255
15652   SUP001  CSC002  1   1   0                   10.2266625467779
15652   SUP001  CSC004  1   1   8.17258608040024    8.17566595300228
15652   SUP001  CSC004  1   1   8.17258608040024    8.17566595300228
15652   SUP001  REC001  1   1   0.127205193350988   -3.13524397829688
15652   SUP001  REC001  1   1   0.127205193350988   -3.13524397829688
...

But I want

...
15652   SUP001  CPF021  1   1   0                   12.5471154256275
15652   SUP001  CSC001  1   1   0.98143845517255    0.98143845517255 
15652   SUP001  CSC002  1   1   0                   10.2266625467779
15652   SUP001  CSC004  1   1   8.17258608040024    8.17566595300228
15652   SUP001  REC001  1   1   0.127205193350988   -3.13524397829688
...
like image 605
MoonKnight Avatar asked Sep 28 '17 13:09

MoonKnight


2 Answers

In order to get this to work in the MERGE statement, you'll need to be a little more specific about the rows. I've adjusted the query below:

MERGE [__EpiTest] AS Tgt 
USING (
    SELECT [ActivityRecordID], 
           [ActCstID], 
           [ResCstID], 
           SUM([TotOCst]) AS TotOCst, 
           SUM([TotCst]) AS TotCst ,
           VolAmt,
           ActCnt
    FROM [__EpiTest]  
    GROUP BY [ActivityRecordID], 
             [ActCstID], 
             [ResCstID], 
             VolAmt,
             ActCnt) AS Src 
ON (Tgt.[ActivityRecordID] = Src.[ActivityRecordID] AND 
     Tgt.[ActCstID] = Src.[ActCstID] AND 
     Tgt.[ResCstID] = Src.[ResCstID] AND
     Tgt.TotOCst = Src.TotOCst AND
     Tgt.TotCst = Src.TotCst
     ) 
WHEN NOT MATCHED BY TARGET THEN
    INSERT ( ActivityRecordID, ActCstID, ResCstID, TotOCst, TotCst, VolAmt, ActCnt )
    VALUES ( ActivityRecordID, ActCstID, ResCstID, TotOCst, TotCst, VolAmt, ActCnt )
WHEN NOT MATCHED BY SOURCE THEN 
    DELETE;
GO

Basically, I've changed it so that the source of the merge include the other columns that seem constant, so that they can be inserted later on. I've changed the MATCH condition to be basically "the row is exactly the same and will be unchanged" and so removed the WHEN MATCHED statement.

I then added an INSERT statement, so that for ones that have changed (that will have their rows deleted by the NOT MATCHED BY TARGET), a new row is inserted with these values.

In your same data, this returns a result set of:

ActivityRecordID    ActCstID    ResCstID    VolAmt  ActCnt  TotOCst TotCst
15652   DIM008  CPF005  30.455249786377 1   0   0.375024198767061
15652   DIM008  CSC004  30.455249786377 1   7.62176510799961    11.932578069479
15652   DIM008  REC001  30.455249786377 1   0.17902367836393    0.384881520159455
15652   OUT001  CPF002  15  0   0   16.9408193013078
15652   OUT001  CSC001  15  0   2.36971564207042    2.36971564207042
15652   OUT001  CSC004  15  0   12.3230666021278    12.3760690367354
15652   OUT001  REC001  15  0   0.377459387378349   3.0275278374102
15652   SUP001  CPF002  1   1   0   0.00108648359810756
15652   SUP001  CPF011  1   1   0   -1.89799880202357E-14
15652   SUP001  CPF020  1   1   0   1.31058251625567E-05
15652   SUP001  CSC002  1   1   0   10.2266625467779
15652   SUP001  REC002  1   1   0   1.10781732547441
15652   SUP001  CPF021  1   1   0   50.1884617025102
15652   SUP001  CSC001  1   1   3.9257538206902 3.9257538206902
15652   SUP001  CSC004  1   1   32.690344321601 32.7026638120092
15652   SUP001  REC001  1   1   0.508820773403952   -12.5409759131875
like image 89
TZHX Avatar answered Oct 06 '22 19:10

TZHX


In the __EpiTest has duplicate values

SELECT * FROM @__EpiTest 
where ResCstID = 'REC001' and ActCstID ='SUP001'
order by ActivityRecordID ,  ActCstID , ResCstID;
ActivityRecordID ActCstID ResCstID VolAmt      ActCnt      TotOCst                                 TotCst
---------------- -------- -------- ----------- ----------- --------------------------------------- ---------------------------------------
15652            SUP001   REC001   1           1           0.2544103867                            -6.2704879566
15652            SUP001   REC001   1           1           0.2544103867                            -6.2704879566

To simple select the single values

  SELECT DISTINCT * FROM @__EpiTest 
    where ResCstID = 'REC001' and ActCstID ='SUP001'
    order by ActivityRecordID ,  ActCstID , ResCstID;
    ActivityRecordID ActCstID ResCstID VolAmt      ActCnt      TotOCst                                 TotCst
    ---------------- -------- -------- ----------- ----------- --------------------------------------- ---------------------------------------
    15652            SUP001   REC001   1           1           0.2544103867                            -6.2704879566

To delete

To delete duplicated values and maintain the frist

;WITH cte as (
SELECT Row_number() OVER (PARTITION BY ActivityRecordID ,  ActCstID , ResCstID ORDER BY (SELECT NULL)) Rn, * FROM @__EpiTest 
where ResCstID = 'REC001' and ActCstID ='SUP001'
)
Delete from cte where Rn > 1 

SELECT * FROM @__EpiTest 
where ResCstID = 'REC001' and ActCstID ='SUP001' 
ActivityRecordID ActCstID ResCstID VolAmt      ActCnt      TotOCst                                 TotCst
---------------- -------- -------- ----------- ----------- --------------------------------------- ---------------------------------------
15652            SUP001   REC001   1           1           0.2544103867                            -6.2704879566

Using the Merge - Be careful, merge statament has serious preformance problems

  ;WITH myResult as (
    SELECT Row_number() OVER (PARTITION BY ActivityRecordID ,  ActCstID , ResCstID ORDER BY (SELECT NULL)) Rn, * FROM @__EpiTest 
    )
    MERGE myResult AS Tgt 
    USING myResult AS Src 
    ON (Tgt.[ActivityRecordID] = Src.[ActivityRecordID] AND 
         Tgt.[ActCstID] = Src.[ActCstID] AND 
         Tgt.[ResCstID] = Src.[ResCstID] AND 
         Tgt.Rn = Src.Rn AND
         Src.Rn = 1)  
    WHEN MATCHED THEN 
        UPDATE 
        SET [TotOCst] = Src.[TotOCst], 
            [TotCst] = Src.[TotCst] 
    WHEN NOT MATCHED BY SOURCE THEN 
        DELETE;
     SELECT * FROM @__EpiTest 
    where ResCstID = 'REC001' and ActCstID ='SUP001' 
Rn                   ActivityRecordID ActCstID ResCstID VolAmt      ActCnt      TotOCst                                 TotCst
-------------------- ---------------- -------- -------- ----------- ----------- --------------------------------------- ---------------------------------------
1                    15652            SUP001   REC001   1           1           0.2544103867                            -6.2704879566
2                    15652            SUP001   REC001   1           1           0.2544103867                            -6.2704879566

(2 row(s) affected)

(20 row(s) affected)

ActivityRecordID ActCstID ResCstID VolAmt      ActCnt      TotOCst                                 TotCst
---------------- -------- -------- ----------- ----------- --------------------------------------- ---------------------------------------
15652            SUP001   REC001   1           1           0.2544103867                            -6.2704879566
like image 30
Maurício Pontalti Neri Avatar answered Oct 06 '22 17:10

Maurício Pontalti Neri