I have two tables:
User:(int id, varchar unique username)
Items: (int id, varchar name, int user_id)
currently, there are case insensitive duplicates in user table like:
1,John
2,john
3,sally
4,saLlY
and the Items table will then have
1,myitem,1
2,mynewitem,2
3,my-item,3
4,mynew-item,4
I've updated the code that inserts to user table to make sure it always inserts lowercase.
However, I need to migrate the database so that duplicates are removed from the user table, and the item table reference is updated so the user doesn't lose access to their items
I.E the data after migration will be:
User:
1,john
3,sally
Items
1,myitem,1
2,mynewitem,1
3,my-item,3
4,mynew-item,3
Since the user table has a unique constraint, i can't just set it to lower like
update public.user set username =lower(username)
SQL Delete Duplicate Rows using Group By and Having Clause According to Delete Duplicate Rows in SQL, for finding duplicate rows, you need to use the SQL GROUP BY clause. The COUNT function can be used to verify the occurrence of a row using the Group by clause, which groups data according to the given columns.
The following code is tested with "H2 1.3.176 (2014-04-05) / embedded mode" on the web console. There are two queries that should solve the issue as you stated, and there is an additional preparation statement for considering a case that - though not shown in your data - should be considered, too. The preparation statement will be explained a little bit later; Let's start with the main two queries:
First, all items.userid
s will be rewritten to those of corresponding user entries with lower case name as follows: Let's call lower case entries main
and non lower case entries dup
. Then, every items.userid
, which refers to a dup.id
, will be set to a corresponding main.id
. A main entry corresponds to a dup entry if a case-insensitive comparison of their names matches, i.e. main.name = lower(dup.name)
.
Second, all dup entries in the user table will be deleted. A dup entry is one where name <> lower(name)
.
So far the basic requirements. Additionally, we should consider that for some users there might exist only entries with upper case characters, but no "lower case entry". For dealing with this situation, a preparation statement is used, which sets - for each group of common names - one name out of each group to lowercase.
drop table if exists usr;
CREATE TABLE usr
(`id` int primary key, `name` varchar(5))
;
INSERT INTO usr
(`id`, `name`)
VALUES
(1, 'John'),
(2, 'john'),
(3, 'sally'),
(4, 'saLlY'),
(5, 'Mary'),
(6, 'mAry')
;
drop table if exists items;
CREATE TABLE items
(`id` int, `name` varchar(10), `userid` int references usr (`id`))
;
INSERT INTO items
(`id`, `name`, `userid`)
VALUES
(1, 'myitem', 1),
(2, 'mynewitem', 2),
(3, 'my-item', 3),
(4, 'mynew-item', 4)
;
update usr set name = lower(name) where id in (select min(ui.id) as minid from usr ui where lower(ui.name) not in (select ui2.name from usr ui2)
group by lower(name));
update items set userid =
(select umain.id as mainid from usr udupl, usr umain
where umain.name = lower(umain.name)
and lower(udupl.name) = lower(umain.name)
and udupl.id = userid
);
delete from usr where name <> lower(name);
select * from usr;
select * from items;
Executing above statements yields the following results:
select * from usr;
ID | NAME
----|-----
2 | john
3 | sally
5 | mary
select * from items;
ID | NAME |USERID
---|----------|------
1 |myitem | 2
2 |mynewitem | 2
3 |my-item | 3
4 |mynew-item| 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With