Can you help me with SQL statements to find duplicates on multiple fields?
For example, in pseudo code:
select count(field1,field2,field3) from table where the combination of field1, field2, field3 occurs multiple times
and from the above statement if there are multiple occurrences I would like to select every record except the first one.
To select duplicate values, you need to create groups of rows with the same values and then select the groups with counts greater than one. You can achieve that by using GROUP BY and a HAVING clause.
To get the list of fields for which there are multiple records, you can use..
select field1,field2,field3, count(*) from table_name group by field1,field2,field3 having count(*) > 1
Check this link for more information on how to delete the rows.
http://support.microsoft.com/kb/139444
There should be a criterion for deciding how you define "first rows" before you use the approach in the link above. Based on that you'll need to use an order by clause and a sub query if needed. If you can post some sample data, it would really help.
You mention "the first one", so I assume that you have some kind of ordering on your data. Let's assume that your data is ordered by some field ID
.
This SQL should get you the duplicate entries except for the first one. It basically selects all rows for which another row with (a) the same fields and (b) a lower ID exists. Performance won't be great, but it might solve your problem.
SELECT A.ID, A.field1, A.field2, A.field3 FROM myTable A WHERE EXISTS (SELECT B.ID FROM myTable B WHERE B.field1 = A.field1 AND B.field2 = A.field2 AND B.field3 = A.field3 AND B.ID < A.ID)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With