BigQuery Standard SQL: Delete Duplicates from Table

Tags:

google-bigquery

I am using below query to delete duplicates records from bigquery using standard sql. but it is throwing error

with cte as (
select * ,row_number()over (partition by CallRailCallId order by CallRailCallId) as rn
from `encoremarketingtest.EncoreMarketingTest.CallRailCall2` )

delete
 from cte
where rn>1

Query Failed
Error: Syntax error: Expected "(" or keyword SELECT but got keyword DELETE at [5:5]

Could anyone help me on the correct approach in BigQuery?

445

asked May 25 '18 07:05

1 Answers

Option #1

CREATE OR REPLACE TABLE `project.dataset.your_table` AS
SELECT * EXCEPT(rn)
FROM (
  SELECT *, ROW_NUMBER() OVER(PARTITION BY CallRailCallId ORDER BY CallRailCallId) rn
  FROM `project.dataset.your_table`
) 
WHERE rn = 1

Option #2

CREATE OR REPLACE TABLE `project.dataset.your_table` AS
SELECT row.*
FROM (
  SELECT ARRAY_AGG(t ORDER BY CallRailCallId LIMIT 1)[OFFSET(0)] row
  FROM `project.dataset.your_table` t
  GROUP BY CallRailCallId
)

As you might noticed, above options using DDL(CREATE TABLE) approach and that is where it is possible to use just one known (from your question) column - CallRailCallId
Also, note - ORDER BY CallRailCallId plays no real role there because GROUP BY and PARTITION BY are by exactly same filed. But if you change the field this will control which exactly row (out of few duplicates) to "survive" (For example ORDER BY ts DESC - see below option for what ts might be)

Option #3

This option uses DML(DELETE FROM) but requires some extra column to be used to serve as a tie-breaker

For example you have ts TIMESTAMP field and you want the most recent (based on ts) row to survive

DELETE FROM `project.dataset.your_table`
WHERE STRUCT(CallRailCallId, ts) NOT IN (
  SELECT AS STRUCT CallRailCallId, MAX(ts) ts
  FROM `project.dataset.your_table`
  GROUP BY CallRailCallId
  )

answered Sep 27 '22 23:09

Mikhail Berlyant

Related questions
                            
                                How to load compressed files into BigQuery
                            
                                How can I apply aggregate functions to data extracted from JSON in Google BigQuery?
                            
                                Add column description to BiqQuery table?
                            
                                New BigQuery pricing 'tiers'
                            
                                How bq query can get 10000 rows?
                            
                                How to use BigQuery Standard SQL in Dataflow?
                            
                                NOT IN not working in google BigQuery standard sql
                            
                                I use to_gbq on pandas for updating Google BigQuery and get GenericGBQException
                            
                                Reverse- geocoding: How to determine the city closest to a (lat,lon) with BigQuery SQL?
                            
                                BigQuery - using SQL UDF in join predicate
                            
                                Workaround for multiple rollups
                            
                                doing a group by in google Bigquery
                            
                                Creating a public dataset (or: split storage costs and compute costs across two projects)
                            
                                What causes "resources exceeded" in BigQuery?
                            
                                Export Google BigQuery data to Python Pandas dataframe
                            
                                BigQuery API limit exceeded error
                            
                                BigQuery select multiple key values
                            
                                Apps Script, convert a Sheet range to Blob
                            
                                Need help formatting datetime timezone for Google API
                            
                                How to catch any exceptions thrown by BigQueryIO.Write and rescue the data which is failed to output?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

BigQuery Standard SQL: Delete Duplicates from Table

Tags:

google-bigquery

Mayank

People also ask

1 Answers

Mikhail Berlyant

Recent Activity

Donate For Us