I have a table with many duplicate items – Many rows with the same id
, perhaps with the only difference being a requested_at
column.
I'd like to do a select *
from the table, but only return one row with the same id
– the most recently requested.
I've looked into group by id
but then I need to do an aggregate for each column. This is easy with requested_at
– max(requested_at) as requested_at
– but the others are tough.
How do I make sure I get the value for title
, etc that corresponds to that most recently updated row?
As mentioned earlier, BigQuery UPSERT means UPDATE+INSERT. It is essentially INSERT that behaves like an UPDATE in case of duplicates, so as to not violate any uniqueness constraints.
A RECORD column can have REPEATED mode, which is represented as an array of STRUCT types. Also, a field within a record can be repeated, which is represented as a STRUCT that contains an ARRAY . An array cannot contain another array directly. For more information, see Declaring an ARRAY type.
SQL Delete Duplicate Rows using Group By and Having Clause According to Delete Duplicate Rows in SQL, for finding duplicate rows, you need to use the SQL GROUP BY clause. The COUNT function can be used to verify the occurrence of a row using the Group by clause, which groups data according to the given columns.
I suggest a similar form that avoids a sort in the window function:
SELECT *
FROM (
SELECT
*,
MAX(<timestamp_column>)
OVER (PARTITION BY <id_column>)
AS max_timestamp,
FROM <table>
)
WHERE <timestamp_column> = max_timestamp
Try something like this:
SELECT *
FROM (
SELECT
*,
ROW_NUMBER()
OVER (
PARTITION BY <id_column>
ORDER BY <timestamp column> DESC)
row_number,
FROM <table>
)
WHERE row_number = 1
Note it will add a row_number
column, which you might not want. To fix this, you can select individual columns by name in the outer select statement.
In your case, it sounds like the requested_at
column is the one you want to use in the ORDER BY
.
And, you will also want to use allow_large_results, set a destination table, and specify no flattening of results (if you have a schema with repeated fields).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With