Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Server: Update only 1 of multiple completely duplicate records

This one is bugging me.

I am working in a data warehousing staging table where there can potentially be rows where the column values are 100% duplicates of others. There are dozens of columns, but for the sake of this argument, lets use the following example:

tblExample (ID Int, Active bit, ModifiedDate DateTime)

Now, at any given time there is only supposed to be one record per ID that has Active set to 1. All others should have active set to 0. There is a process that enforces this during the data loading.

That process can and has broken in the past, resulting in data like this:

ID     Active ModifiedDate
123456 0      2016-05-27 12:37:46.111
123456 1      2016-05-27 12:37:46.433
123456 1      2016-05-27 12:37:46.433

In that case there are 2 "Identical" records that have Active set to 1. I need to find a way to make only one of those records active = 1.

Now the process I am using to do this currently assumes that the Date value is unique and in 99.99% of the times that IS the case. But there are times when the date will also be duplicated. And I can't for the life of me figure out a way to update only a single one of those records, since I have nothing to latch on to for the WHERE.

  • I cannot add or modify the schema.
  • This is all happening inside ssis, so I am limited to the restrictions inherent with that platform (Some things just don't work well in SSIS)

Ideas?

like image 939
David Borneman Avatar asked Nov 27 '25 13:11

David Borneman


1 Answers

This should work:

with a as(
select *, ROW_NUMBER() OVER (PARTITION by ID, Active, ModifiedDate order by ModifiedDate) as rn from tblExample
)       

update a set active = 0 where rn >1
select * from tblExample;

Here is an example with your data.

Create a CTE with Row_number() for dupes on ModifiedDate(as your solution works for non-dupes of ModifiedDate) and updates the CTE, updating your data.

If you want to replace your process you could use the below:

with a as(
select *, ROW_NUMBER() OVER (PARTITION by ID, Active order by ModifiedDate desc) as rn from tblExample
)

update a set active = 0 where rn >1
select *, ROW_NUMBER() OVER (PARTITION by ID, A order by ModifiedDate desc) as rn from tblExample;

This only allows the most recent entry for each ID to be active

Alternative solution

like image 197
EoinS Avatar answered Nov 30 '25 04:11

EoinS



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!