I have a table with >1M rows of data and 20+ columns.
Within my table (tableX) I have identified duplicate records (~80k) in one particular column (troubleColumn).
If possible I would like to retain the original table name and remove the duplicate records from my problematic column otherwise I could create a new table (tableXfinal) with the same schema but without the duplicates.
I am not proficient in SQL or any other programming language so please excuse my ignorance.
delete from Accidents.CleanedFilledCombined where Fixed_Accident_Index in(select Fixed_Accident_Index from Accidents.CleanedFilledCombined group by Fixed_Accident_Index having count(Fixed_Accident_Index) >1);
The go to solution for removing duplicate rows from your result sets is to include the distinct keyword in your select statement. It tells the query engine to remove duplicates to produce a result set in which every row is unique.
Use the DELETE statement when you want to delete rows from a table. To delete all rows in a table, use the TRUNCATE TABLE statement.
Are you joining tables with one to many relationships? This will often result in the same rows being returned multiple times. If there are multiple child rows connected to a parent. The other more obvious reason could be that you actually have duplicate data in your database.
You can remove duplicates by running a query that rewrites your table (you can use the same table as the destination, or you can create a new table, verify that it has what you want, and then copy it over the old table).
A query that should work is here:
SELECT * FROM ( SELECT *, ROW_NUMBER() OVER (PARTITION BY Fixed_Accident_Index) row_number FROM Accidents.CleanedFilledCombined ) WHERE row_number = 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With