Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL - Redshift remove duplicate rows without primary key

I have a big redshift database. the records do not have unique key.

I'd like to convert the below sql to postgresql.

DELETE FROM (
SELECT *,ROW_NUMBER() OVER (partition BY column1, column2, column3) AS rnum
FROM tablename ) t1
WHERE t1.rnum > 1;

perform the above sql,the following errors appear :

Amazon Invalid operation: syntax error at or near "(";

please :'( .

like image 988
hyeyoung Avatar asked Apr 24 '26 14:04

hyeyoung


1 Answers

In Postgres, you can do this using ctid. This is a system "column" that physically identifies each row.

The idea is:

delete from tablename
    where ctid not in (select min(t2.ctid)
                       from tablename t2
                       group by column1, column2, column3
                      );

I am not sure if Redshift supports ctid. But then again, despite the tags, your question is explicitly about Postgres.

like image 144
Gordon Linoff Avatar answered Apr 27 '26 09:04

Gordon Linoff



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!