How do I delete duplicates, and update the records that refer to those duplicates in SQL

Tags:

1 Answers

The following code is tested with "H2 1.3.176 (2014-04-05) / embedded mode" on the web console. There are two queries that should solve the issue as you stated, and there is an additional preparation statement for considering a case that - though not shown in your data - should be considered, too. The preparation statement will be explained a little bit later; Let's start with the main two queries:

First, all items.userids will be rewritten to those of corresponding user entries with lower case name as follows: Let's call lower case entries main and non lower case entries dup. Then, every items.userid, which refers to a dup.id, will be set to a corresponding main.id. A main entry corresponds to a dup entry if a case-insensitive comparison of their names matches, i.e. main.name = lower(dup.name).

Second, all dup entries in the user table will be deleted. A dup entry is one where name <> lower(name).

So far the basic requirements. Additionally, we should consider that for some users there might exist only entries with upper case characters, but no "lower case entry". For dealing with this situation, a preparation statement is used, which sets - for each group of common names - one name out of each group to lowercase.

drop table if exists usr;

CREATE TABLE usr
    (`id` int primary key, `name` varchar(5))
;

INSERT INTO usr
    (`id`, `name`)
VALUES
    (1, 'John'),
    (2, 'john'),
    (3, 'sally'),
    (4, 'saLlY'),
    (5, 'Mary'),
    (6, 'mAry')

;

drop table if exists items;

CREATE TABLE items
    (`id` int, `name` varchar(10), `userid` int references usr (`id`))
;

INSERT INTO items
    (`id`, `name`, `userid`)
VALUES
    (1, 'myitem', 1),
    (2, 'mynewitem', 2),
    (3, 'my-item', 3),
    (4, 'mynew-item', 4)
;

update usr set name = lower(name) where id in (select min(ui.id) as minid from usr ui where lower(ui.name) not in (select ui2.name from usr ui2)
group by lower(name));

update items set userid =
(select umain.id as mainid from usr udupl, usr umain
 where umain.name = lower(umain.name)
     and lower(udupl.name) = lower(umain.name)
     and udupl.id = userid
);

delete from usr where name <> lower(name);

select * from usr;

select * from items;

Executing above statements yields the following results:

select * from usr;
ID  | NAME
----|-----
2   | john
3   | sally
5   | mary

select * from items;
ID | NAME     |USERID  
---|----------|------
1  |myitem    | 2
2  |mynewitem | 2
3  |my-item   | 3
4  |mynew-item| 3

194

answered Nov 09 '22 02:11

Stephan Lechner

Related questions
                            
                                PostgreSQL Trigger on Insert or Update
                            
                                Django Queryset __in with None value in list
                            
                                Alter all tables in database
                            
                                Django object not saving even after "save" call
                            
                                SQL Server 2012: Add a linked server to PostgreSQL
                            
                                How to use pandas to group pivot table results by week?
                            
                                Oracle: Single multicolumn index or two single column indexes
                            
                                LINQ to Entities equivalent of sql "TOP(n) WITH TIES"
                            
                                WPF C# application using local database
                            
                                Remove Rows That Sum Zero For A Given Key
                            
                                What is SQL equivalent to LINQ .All()
                            
                                How to search for a text? (MySQL)
                            
                                What happens if you don't close a pyodbc connection?
                            
                                Optimizing window function in PostgreSQL to use index
                            
                                SQL how to merge similar records into single row from same table?
                            
                                Why WHEN MATCHED' cannot appear more than once in a 'UPDATE' clause of a MERGE statement?
                            
                                Update multiple second Id's to match the first Id's of a series
                            
                                MySQL REPLACE affects 0 rows but WHERE ... LIKE returns 90
                            
                                Left outer join with only first row
                            
                                Are temporary tables in postgresql visible over all client sessions?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I delete duplicates, and update the records that refer to those duplicates in SQL

Tags:

sql

h2

user171943

People also ask

1 Answers

Stephan Lechner

Recent Activity

Donate For Us