Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delete duplicate rows from a BigQuery table

I have a table with >1M rows of data and 20+ columns.

Within my table (tableX) I have identified duplicate records (~80k) in one particular column (troubleColumn).

If possible I would like to retain the original table name and remove the duplicate records from my problematic column otherwise I could create a new table (tableXfinal) with the same schema but without the duplicates.

I am not proficient in SQL or any other programming language so please excuse my ignorance.

delete from Accidents.CleanedFilledCombined  where Fixed_Accident_Index  in(select Fixed_Accident_Index from Accidents.CleanedFilledCombined  group by Fixed_Accident_Index  having count(Fixed_Accident_Index) >1); 
like image 871
TheGoat Avatar asked Apr 17 '16 10:04

TheGoat


People also ask

How do I remove duplicate rows from a query?

The go to solution for removing duplicate rows from your result sets is to include the distinct keyword in your select statement. It tells the query engine to remove duplicates to produce a result set in which every row is unique.

How do I delete a row in BigQuery?

Use the DELETE statement when you want to delete rows from a table. To delete all rows in a table, use the TRUNCATE TABLE statement.

What causes duplicates in SQL?

Are you joining tables with one to many relationships? This will often result in the same rows being returned multiple times. If there are multiple child rows connected to a parent. The other more obvious reason could be that you actually have duplicate data in your database.


1 Answers

You can remove duplicates by running a query that rewrites your table (you can use the same table as the destination, or you can create a new table, verify that it has what you want, and then copy it over the old table).

A query that should work is here:

SELECT * FROM (   SELECT       *,       ROW_NUMBER()           OVER (PARTITION BY Fixed_Accident_Index)           row_number   FROM Accidents.CleanedFilledCombined ) WHERE row_number = 1 
like image 177
Jordan Tigani Avatar answered Sep 22 '22 10:09

Jordan Tigani