Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing duplicate rows from a table in DB2 in a single query

I have a table with 3 columns as below:

one   |   two    |  three  |   name
------------------------------------
 A1       B1          C1        xyz
 A1       B1          C1        pqr      -> should be deleted
 A1       B1          C1        lmn      -> should be deleted
 A2       B2          C2        abc
 A2       B2          C2        def      -> should be deleted
 A3       B3          C3        ghi
------------------------------------ 

The table is not having any primary key column. I do not have any control over the table and so I can not add any primary key column.

As shown, I want to delete the rows where the combination of one, two and three column is same. So if A1B1C1 is occurring thrice (as in above e.g.), the other two should be deleted and only one should stay.

How to achieve this through just one query in DB2 ?

My requirement is for a single query as I would be running it through a java program.

like image 907
Vicky Avatar asked Apr 10 '12 11:04

Vicky


People also ask

How do you eliminate duplicate values in db2 select of a single table?

Procedure. Specify the DISTINCT keyword in the query. The DISTINCT keyword excludes duplicate rows from your query result table, so that each row contains unique data. Restriction: You cannot use the DISTINCT keyword with LOB columns or XML columns.

How do I remove duplicate rows from a query?

The go to solution for removing duplicate rows from your result sets is to include the distinct keyword in your select statement. It tells the query engine to remove duplicates to produce a result set in which every row is unique.

How do you delete duplicate records from a table using SQL query?

To delete the duplicate rows from the table in SQL Server, you follow these steps: Find duplicate rows using GROUP BY clause or ROW_NUMBER() function. Use DELETE statement to remove the duplicate rows.


1 Answers

(This assumes you're on DB2 for Linux/Unix/Windows, other platforms may vary slightly)

DELETE FROM
    (SELECT ROWNUMBER() OVER (PARTITION BY ONE, TWO, THREE) AS RN
     FROM SESSION.TEST) AS A
WHERE RN > 1;

Should get you what you're looking for.

The query uses the OLAP function ROWNUMBER() to assign a number for each row within each ONE, TWO, THREE combination. DB2 is then able to match the rows referenced by the fullselect (A) as the rows that the DELETE statement should remove from the table. In order to be able to use a fullselect as the target for a delete clause, it has to match the rules for a deletable view (see "deletable view" under the notes section).

Below is some proof (tested on LUW 9.7):

DECLARE GLOBAL TEMPORARY TABLE SESSION.TEST (
    one CHAR(2),
    two CHAR(2),
    three CHAR(2),
    name CHAR(3)
) ON COMMIT PRESERVE ROWS;

INSERT INTO SESSION.TEST VALUES 
    ('A1', 'B1', 'C1', 'xyz'),
    ('A1', 'B1', 'C1', 'pqr'),
    ('A1', 'B1', 'C1', 'lmn'),
    ('A2', 'B2', 'C2', 'abc'),
    ('A2', 'B2', 'C2', 'def'),
    ('A3', 'B3', 'C3', 'ghi');

DELETE FROM
    (SELECT ROWNUMBER() OVER (PARTITION BY ONE, TWO, THREE) AS RN
     FROM SESSION.TEST) AS A
WHERE RN > 1;

SELECT * FROM SESSION.TEST;

Edit 2 March 2017:

In response to the question from Ahmed Anwar, if you need to capture what was deleted, you can also combine the delete with a "data change statement". In this example you could do something like the following, which would give you the "rn" column, one, two, and three:

SELECT * FROM OLD TABLE (
    DELETE FROM
        (SELECT 
             ROWNUMBER() OVER (PARTITION BY ONE, TWO, THREE) AS RN
            ,ONE
            ,TWO
            ,THREE
         FROM SESSION.TEST) AS A
    WHERE RN > 1
) OLD;
like image 132
bhamby Avatar answered Nov 03 '22 00:11

bhamby