I have a table with >1M rows of data and 20+ columns. Within my table (tableX) I have identified duplicate records (~80k) in one particular column (troubleColumn). If possible I would like to retain the original table name and remove the duplicate records from my problematic column otherwise I could create a new table (tableXfinal) with the same schema but without the duplicates. I am not proficient in SQL or any other programming language so please excuse my ignorance. <pre class="prettyprint"><code>delete from Accidents.CleanedFilledCombined where Fixed_Accident_Index in(select Fixed_Accident_Index from Accidents.CleanedFilledCombined group by Fixed_Accident_Index having count(Fixed_Accident_Index) >1); </code></pre>

You can remove duplicates by running a query that rewrites your table (you can use the same table as the destination, or you can create a new table, verify that it has what you want, and then copy it over the old table). A query that should work is here: <pre class="prettyprint lang-sql prettyprint-override"><code>SELECT * FROM ( SELECT *, ROW_NUMBER() OVER (PARTITION BY Fixed_Accident_Index) row_number FROM Accidents.CleanedFilledCombined ) WHERE row_number = 1 </code></pre>

Delete duplicate rows from a BigQuery table

Tags:

distinct

google-bigquery

I have a table with >1M rows of data and 20+ columns.

Within my table (tableX) I have identified duplicate records (~80k) in one particular column (troubleColumn).

If possible I would like to retain the original table name and remove the duplicate records from my problematic column otherwise I could create a new table (tableXfinal) with the same schema but without the duplicates.

I am not proficient in SQL or any other programming language so please excuse my ignorance.

delete from Accidents.CleanedFilledCombined  where Fixed_Accident_Index  in(select Fixed_Accident_Index from Accidents.CleanedFilledCombined  group by Fixed_Accident_Index  having count(Fixed_Accident_Index) >1);

871

asked Apr 17 '16 10:04

TheGoat

1 Answers

You can remove duplicates by running a query that rewrites your table (you can use the same table as the destination, or you can create a new table, verify that it has what you want, and then copy it over the old table).

A query that should work is here:

SELECT * FROM (   SELECT       *,       ROW_NUMBER()           OVER (PARTITION BY Fixed_Accident_Index)           row_number   FROM Accidents.CleanedFilledCombined ) WHERE row_number = 1

177

answered Sep 22 '22 10:09

Jordan Tigani

Related questions
                            
                                Select distinct by two properties in a list
                            
                                how to select rows based on distinct values of A COLUMN only
                            
                                How to execute UNION without sorting? (SQL)
                            
                                Using DISTINCT inner join in SQL
                            
                                Retrieving last record in each group from database - SQL Server 2005/2008
                            
                                Converting SELECT DISTINCT ON queries from Postgresql to MySQL
                            
                                DISTINCT clause with WHERE
                            
                                sql group by versus distinct
                            
                                Eliminating duplicate values based on only one column of the table
                            
                                DISTINCT ON in an aggregate function in postgres
                            
                                Criteria.DISTINCT_ROOT_ENTITY vs Projections.distinct
                            
                                linq distinct or group by multiple properties
                            
                                SQL - select distinct only on one column [duplicate]
                            
                                GROUP BY and COUNT in PostgreSQL
                            
                                Efficiently merge string arrays in .NET, keeping distinct values
                            
                                Can you create a simple 'EqualityComparer<T>' using a lambda expression
                            
                                Efficient Count Distinct with Apache Spark
                            
                                Find distinct values, not distinct counts in elasticsearch
                            
                                distinct in Xpath?
                            
                                Use a delegate for the equality comparer for LINQ's Distinct()

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With