Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing Duplicate Rows in PostgreSQL with multiple columns

Tags:

sql

postgresql

I have a table "votes" with the following columns: voter, election_year, election_type, party I need to remove all duplicate rows of the combination of voter and election_year, and I'm having trouble figuring out how to do this.

I ran the following:

WITH CTE AS(
SELECT voter, 
       election_year,
       ROW_NUMBER()OVER(PARTITION BY voter, election_year ORDER BY voter) as RN

FROM votes
)
DELETE
FROM CTE where RN>1

based on another StackOverflow answer, but it seems this is specific to SQL Server. I've seen ways to do this using unique ID's, but this particular table doesn't have that luxury. How can I adopt the above script to remove the duplicates I need? Thanks!

EDIT: Per request, creation of the table with some example data:

CREATE TABLE public.votes
(
    voter varchar(10),
    election_year smallint,
    election_type varchar(2),
    party varchar(3)
);

INSERT INTO votes
    (voter, election_year, election_type, party)
VALUES
    ('2435871347', 2018, 'PO', 'EV'),
    ('2435871347', 2018, 'RU', 'EV'),
    ('2435871347', 2018, 'GE', 'EV'),
    ('2435871347', 2016, 'PO', 'EV'),
    ('2435871347', 2016, 'GE', 'EV'),
    ('10215121/8', 2016, 'GE', 'ED')
;
like image 601
JGrindal Avatar asked Aug 19 '18 01:08

JGrindal


People also ask

How do I remove duplicate rows based on multiple columns in SQL?

In SQL, some rows contain duplicate entries in multiple columns(>1). For deleting such rows, we need to use the DELETE keyword along with self-joining the table with itself.

How do I remove duplicate rows from entire row?

Select the range you want to remove duplicate rows. If you want to delete all duplicate rows in the worksheet, just hold down Ctrl + A key to select the entire sheet. 2. On Data tab, click Remove Duplicates in the Data Tools group.


2 Answers

Here's an option

DELETE FROM votes T1
    USING   votes T2
WHERE   T1.ctid < T2.ctid 
    AND T1.voter = T2.voter 
    AND T1.election_year  = T2.election_year;

See http://sqlfiddle.com/#!15/4d45d/5

like image 119
mankowitz Avatar answered Oct 04 '22 16:10

mankowitz


Delete from or updating CTEs doesn't work in Postgres, see the accepted answer of "PostgreSQL with-delete “relation does not exists”".

Since you have no primary key you may (ab)use the ctid pseudo column to identify the rows to delete.

WITH
cte
AS
(
SELECT ctid,
       row_number() OVER (PARTITION BY voter,
                                       election_year
                          ORDER BY voter) rn
       FROM votes
)
DELETE FROM votes
       USING cte
       WHERE cte.rn > 1
             AND cte.ctid = votes.ctid;

db<>fiddle

And probably think about introducing a primary key.

like image 37
sticky bit Avatar answered Oct 04 '22 15:10

sticky bit