Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Deleting Duplicate Row Combinations in a given Date

So I have a dataset of about 160 000 entries, they are computer generated and over the years mistakes happened.

Lets say the Table has the following columns:

- EntryID (auto int)
- FruitNumber
- JuiceNumber
- CandyNumber
- Date

Now the important thing is each combination of FruitNumber, JuiceNumber,CandyNumber is Unique when the time between them is less than 12 Months.

That means every exact combination of these can only exist once in 12 months. Now I need to get this dataset migrated into a new data model and for this I need to delete duplicate records (but keep 1 of them), I tried around alot with Queries but wasn´t able to find a solution.

like image 526
Badgy Avatar asked Nov 07 '22 19:11

Badgy


1 Answers

Try to use cte:

;WITH cte AS 
(
SELECT 
  ft.EntryID
, ft.FruitNumber
, ft.JuiceNumber
, ft.CandyNumber
, ft.Date
, ROW_NUMBER() OVER (PARTITION BY ft.FruitNumber, ft.JuiceNumber, ft.CandyNumber 
     ORDER BY ft.FruitNumber) RN
, DENSE_RANK() OVER (ORDER BY ft.FruitNumber, ft.JuiceNumber, ft.CandyNumber) 
     AS Partitionid
, COUNT(1) OVER (PARTITION BY ft.FruitNumber, ft.JuiceNumber, ft.CandyNumber 
     ORDER BY ft.FruitNumber) as PartitionCNT
FROM FooTable ft
)

SELECT 
t1.* 
, DATEDIFF(DAY, t.Date, t1.Date) DATEDiff
FROM 
cte t 
INNER JOIN cte t1 
    ON t1.FruitNumber = t.FruitNumber
        AND  t1.JuiceNumber = t.JuiceNumber
        AND  t1.CandyNumber = t.CandyNumber
        AND DATEDIFF(DAY, t.Date, t1.Date)>= 365
WHERE t.PartitionCNT > 1

And the sample data:

CREATE TABLE FooTable
(
    EntryID INT IDENTITY(1, 1) PRIMARY KEY,
    FruitNumber INT,
    JuiceNumber INT,
    CandyNumber INT,
    [Date] DATETIME
);*/


INSERT INTO FooTable
VALUES
(1, 2, 3 , '2019-03-01 00:00:00.000'),
(1, 2, 3 , '2020-03-01 00:00:00.000'),
(4, 5, 6 , '2019-03-01 00:00:00.000'),
(7, 8, 9 , '2019-03-01 00:00:00.000'),
(10, 11, 12 , '2018-03-20 00:00:00.000'),
(13, 14, 15 , '2018-03-20 00:00:00.000'),
(16, 17, 18 , '2017-03-09 00:00:00.000'),
(16, 17, 18 , '2017-02-09 00:00:00.000'),
(22, 23, 34 , '2017-02-12 00:00:00.000'),
(22, 23, 34 , '2017-02-12 00:00:00.000');

And OUTPUT:

EntryID FruitNumber JuiceNumber CandyNumber
   2           1           2          3 
like image 144
StepUp Avatar answered Nov 12 '22 11:11

StepUp