What's the best way to dedupe a table?

Tags:

I've seen a couple of solutions for this, but I'm wondering what the best and most efficient way is to de-dupe a table. You can use code (SQL, etc.) to illustrate your point, but I'm just looking for basic algorithms. I assumed there would already be a question about this on SO, but I wasn't able to find one, so if it already exists just give me a heads up.

(Just to clarify - I'm referring to getting rid of duplicates in a table that has an incremental automatic PK and has some rows that are duplicates in everything but the PK field.)

738

asked Feb 09 '10 15:02

froadie

1 Answers

SELECT DISTINCT <insert all columns but the PK here> FROM foo. Create a temp table using that query (the syntax varies by RDBMS but there's typically a SELECT … INTO or CREATE TABLE AS pattern available), then blow away the old table and pump the data from the temp table back into it.

100

answered Sep 20 '22 17:09

Hank Gay

Related questions
                            
                                Combination of 'LIKE' and 'IN' using t-sql
                            
                                DISTINCT clause with WHERE
                            
                                What is the meaning of grave accent (AKA backtick) quoted characters in MySQL?
                            
                                The DateTime represented by the string is not supported in calendar System.Globalization.GregorianCalendar
                            
                                ORA-01861: literal does not match format string
                            
                                MS SQL Server - When is a CURSOR good?
                            
                                GROUP BY behavior when no aggregate functions are present in the SELECT clause
                            
                                Is it possible to reference one column as multiple foreign keys?
                            
                                Javascript libraries that allow for SQL-like queries on JSON data? [closed]
                            
                                Stored Procedure, when to use Output parameter vs Return variable
                            
                                What are the pros/cons of using a synonym vs. a view?
                            
                                What is the difference between using a cross join and putting a comma between the two tables?
                            
                                When to use SQL sub-queries versus a standard join?
                            
                                MYSQL order by both Ascending and Descending sorting
                            
                                Sql Server deterministic user-defined function
                            
                                Timezones in SQL DATE vs java.sql.Date
                            
                                Job queue as SQL table with multiple consumers (PostgreSQL)
                            
                                How do I query for something that starts with certain characters?
                            
                                SQL: selecting rows where column value changed from previous row
                            
                                SQL 'LIKE' query using '%' where the search criteria contains '%'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What's the best way to dedupe a table?

Tags:

performance

algorithm

sql

duplicates

froadie

People also ask

1 Answers

Hank Gay

Recent Activity

Donate For Us