I have some data in a table that looks roughly like the following:
table stockData
(
tickId int not null,
timestamp datetime not null,
price decimal(18,5) not null
)
Neither tickId nor timestamp are unique, however the combination of tickId and timestamp is supposed to be unique.
I have some duplicate data in my table, and I'm attempting to remove it. However, I'm coming to the conclusion that there is not enough information with the given data for me to discern one row from the other, and basically no way for me to delete just one of the duplicate rows. My guess is that I will need to introduce some sort of identity column, which would help me to identify one row from the other.
Is this correct, or is there some magic way of deleting one but not both of the duplicate data with a query?
EDIT Edited to clarify that tickId and timestamp combo should be unique, but it's not because of the duplicate data.
Here is a query that will remove duplicates and leave exactly one copy of each unique row. It will work with SQL Server 2005 or higher:
WITH Dups AS
(
SELECT tickId, timestamp, price,
ROW_NUMBER() OVER(PARTITION BY tickid, timestamp ORDER BY (SELECT 0)) AS rn
FROM stockData
)
DELETE FROM Dups WHERE rn > 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With