Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I delete duplicates, and update the records that refer to those duplicates in SQL

Tags:

sql

h2

I have two tables:

User:(int id, varchar unique username)

Items: (int id, varchar name, int user_id)

currently, there are case insensitive duplicates in user table like:

1,John
2,john
3,sally
4,saLlY

and the Items table will then have

1,myitem,1
2,mynewitem,2
3,my-item,3
4,mynew-item,4

I've updated the code that inserts to user table to make sure it always inserts lowercase.

However, I need to migrate the database so that duplicates are removed from the user table, and the item table reference is updated so the user doesn't lose access to their items

I.E the data after migration will be:

User:

1,john
3,sally

Items

1,myitem,1
2,mynewitem,1
3,my-item,3
4,mynew-item,3

Since the user table has a unique constraint, i can't just set it to lower like

update public.user set username =lower(username)
like image 268
user171943 Avatar asked Jan 26 '17 22:01

user171943


People also ask

How do you delete duplicate records in SQL query?

SQL Delete Duplicate Rows using Group By and Having Clause According to Delete Duplicate Rows in SQL, for finding duplicate rows, you need to use the SQL GROUP BY clause. The COUNT function can be used to verify the occurrence of a row using the Group by clause, which groups data according to the given columns.


1 Answers

The following code is tested with "H2 1.3.176 (2014-04-05) / embedded mode" on the web console. There are two queries that should solve the issue as you stated, and there is an additional preparation statement for considering a case that - though not shown in your data - should be considered, too. The preparation statement will be explained a little bit later; Let's start with the main two queries:

First, all items.userids will be rewritten to those of corresponding user entries with lower case name as follows: Let's call lower case entries main and non lower case entries dup. Then, every items.userid, which refers to a dup.id, will be set to a corresponding main.id. A main entry corresponds to a dup entry if a case-insensitive comparison of their names matches, i.e. main.name = lower(dup.name).

Second, all dup entries in the user table will be deleted. A dup entry is one where name <> lower(name).

So far the basic requirements. Additionally, we should consider that for some users there might exist only entries with upper case characters, but no "lower case entry". For dealing with this situation, a preparation statement is used, which sets - for each group of common names - one name out of each group to lowercase.

drop table if exists usr;

CREATE TABLE usr
    (`id` int primary key, `name` varchar(5))
;

INSERT INTO usr
    (`id`, `name`)
VALUES
    (1, 'John'),
    (2, 'john'),
    (3, 'sally'),
    (4, 'saLlY'),
    (5, 'Mary'),
    (6, 'mAry')

;

drop table if exists items;

CREATE TABLE items
    (`id` int, `name` varchar(10), `userid` int references usr (`id`))
;

INSERT INTO items
    (`id`, `name`, `userid`)
VALUES
    (1, 'myitem', 1),
    (2, 'mynewitem', 2),
    (3, 'my-item', 3),
    (4, 'mynew-item', 4)
;

update usr set name = lower(name) where id in (select min(ui.id) as minid from usr ui where lower(ui.name) not in (select ui2.name from usr ui2)
group by lower(name));

update items set userid =
(select umain.id as mainid from usr udupl, usr umain
 where umain.name = lower(umain.name)
     and lower(udupl.name) = lower(umain.name)
     and udupl.id = userid
);

delete from usr where name <> lower(name);

select * from usr;

select * from items;

Executing above statements yields the following results:

select * from usr;
ID  | NAME
----|-----
2   | john
3   | sally
5   | mary

select * from items;
ID | NAME     |USERID  
---|----------|------
1  |myitem    | 2
2  |mynewitem | 2
3  |my-item   | 3
4  |mynew-item| 3
like image 194
Stephan Lechner Avatar answered Nov 09 '22 02:11

Stephan Lechner