Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

deleting duplicate row with no unique identifier

Tags:

sql

sql-server

I have some data in a table that looks roughly like the following:

table stockData
(
tickId int not null,
timestamp datetime not null,
price decimal(18,5) not null
)

Neither tickId nor timestamp are unique, however the combination of tickId and timestamp is supposed to be unique.

I have some duplicate data in my table, and I'm attempting to remove it. However, I'm coming to the conclusion that there is not enough information with the given data for me to discern one row from the other, and basically no way for me to delete just one of the duplicate rows. My guess is that I will need to introduce some sort of identity column, which would help me to identify one row from the other.

Is this correct, or is there some magic way of deleting one but not both of the duplicate data with a query?

EDIT Edited to clarify that tickId and timestamp combo should be unique, but it's not because of the duplicate data.

like image 752
steve8918 Avatar asked Mar 25 '12 18:03

steve8918


1 Answers

Here is a query that will remove duplicates and leave exactly one copy of each unique row. It will work with SQL Server 2005 or higher:

WITH Dups AS
(
  SELECT tickId, timestamp, price,
    ROW_NUMBER() OVER(PARTITION BY tickid, timestamp ORDER BY (SELECT 0)) AS rn
  FROM stockData
)
DELETE FROM Dups WHERE rn > 1
like image 98
Sergey Rybalkin Avatar answered Nov 14 '22 13:11

Sergey Rybalkin