Say i have duplicate rows in my table and well my database design is of 3rd class :-
Insert Into tblProduct (ProductId,ProductName,Description,Category) Values (1,'Cinthol','cosmetic soap','soap');
Insert Into tblProduct (ProductId,ProductName,Description,Category) Values (1,'Cinthol','cosmetic soap','soap');
Insert Into tblProduct (ProductId,ProductName,Description,Category) Values (1,'Cinthol','cosmetic soap','soap');
Insert Into tblProduct (ProductId,ProductName,Description,Category) Values (1,'Lux','cosmetic soap','soap');
Insert Into tblProduct (ProductId,ProductName,Description,Category) Values (1,'Crowning Glory','cosmetic soap','soap');
Insert Into tblProduct (ProductId,ProductName,Description,Category) Values (2,'Cinthol','nice soap','soap');
Insert Into tblProduct (ProductId,ProductName,Description,Category) Values (3,'Lux','nice soap','soap');
Insert Into tblProduct (ProductId,ProductName,Description,Category) Values (3,'Lux','nice soap','soap');
I want only 1 instance of each row should be present in my table. Thus 2nd, 3rd and last row
whcih are completely identical should be deleted. What query can i write for this? Can it be done without creating temp tables? Just in one single query?
Thanks in advance :)
Select the range you want to remove duplicate rows. If you want to delete all duplicate rows in the worksheet, just hold down Ctrl + A key to select the entire sheet. 2. On Data tab, click Remove Duplicates in the Data Tools group.
Be careful when using the Remove Duplicates tool in Excel. Remove duplicates DOES NOT work if number formats are different. It is very literal in that even if the format is different it will assume it is not a duplicate. So the numbers 10 and 10.00 will be seen as unique and not deleted.
Try this - it will delete all duplicates from your table:
;WITH duplicates AS
(
SELECT
ProductID, ProductName, Description, Category,
ROW_NUMBER() OVER (PARTITION BY ProductID, ProductName
ORDER BY ProductID) 'RowNum'
FROM dbo.tblProduct
)
DELETE FROM duplicates
WHERE RowNum > 1
GO
SELECT * FROM dbo.tblProduct
GO
Your duplicates should be gone now: output is:
ProductID ProductName DESCRIPTION Category
1 Cinthol cosmetic soap soap
1 Lux cosmetic soap soap
1 Crowning Glory cosmetic soap soap
2 Cinthol nice soap soap
3 Lux nice soap soap
DELETE tblProduct
FROM tblProduct
LEFT OUTER JOIN (
SELECT MIN(ProductId) as ProductId, ProductName, Description, Category
FROM tblProduct
GROUP BY ProductName, Description, Category
) as KeepRows ON
tblProduct.ProductId= KeepRows.ProductId
WHERE
KeepRows.ProductId IS NULL
Stolen from How can I remove duplicate rows?
UPDATE:
This will only work if ProductId is a Primary Key (which it is not). You are better off using @marc_s' method, but I'll leave this up in case someone using a PK comes across this post.
I had to do this a few weeks back... what version of SQL Server are you using? In SQL Server 2005 and up, you can use Row_Number as part of your select, and only select where Row_Number is 1. I forget the exact syntax, but it's well documented... something along the lines of:
Select t0.ProductID,
t0.ProductName,
t0.Description,
t0.Category
Into tblCleanData
From (
Select ProductID,
ProductName,
Description,
Category,
Row_Number() Over (
Partition By ProductID,
ProductName,
Description,
Category
Order By ProductID,
ProductName,
Description,
Category
) As RowNumber
From MyTable
) As t0
Where t0.RowNumber = 1
Check out http://msdn.microsoft.com/en-us/library/ms186734.aspx, that should get you going in the right direction.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With