Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the best way to dedupe a table?

I've seen a couple of solutions for this, but I'm wondering what the best and most efficient way is to de-dupe a table. You can use code (SQL, etc.) to illustrate your point, but I'm just looking for basic algorithms. I assumed there would already be a question about this on SO, but I wasn't able to find one, so if it already exists just give me a heads up.

(Just to clarify - I'm referring to getting rid of duplicates in a table that has an incremental automatic PK and has some rows that are duplicates in everything but the PK field.)

like image 738
froadie Avatar asked Feb 09 '10 15:02

froadie


People also ask

Which command is used to remove duplicate values of a table?

We can use Common Table Expressions commonly known as CTE to remove duplicate rows in SQL Server. It is available starting from SQL Server 2005. We use a SQL ROW_NUMBER function, and it adds a unique sequential row number for the row.

How can you delete duplicate records in a table with no primary key?

DELETE Duplicate Records Using ROWCOUNT So to delete the duplicate record with SQL Server we can use the SET ROWCOUNT command to limit the number of rows affected by a query. By setting it to 1 we can just delete one of these rows in the table.


1 Answers

SELECT DISTINCT <insert all columns but the PK here> FROM foo. Create a temp table using that query (the syntax varies by RDBMS but there's typically a SELECT … INTO or CREATE TABLE AS pattern available), then blow away the old table and pump the data from the temp table back into it.

like image 100
Hank Gay Avatar answered Sep 20 '22 17:09

Hank Gay