I have some difficulty in removing duplicates rows. I thought user_id and time_id together acting as an identifier but there were even duplicates for those.
user_id (text), time_id(bigint), value1 (numeric)
user_id; time_id; value1|
aaa;1;3|
aaa;1;3|
aaa;2;4|
baa;3;1|
In this case how do I remove duplicates? Since I have 16 distinct values in time_id and 15,000 distinct ones in user_id, I tried something like this but I do not have an unique id..
DELETE FROM tablename a
USING tablename b
WHERE a.unique_id < b.unique_id
AND a.user_id = b.user_id
time_id = 1 (repeat till time_id 16)
Each table in Postgres has a few hidden system columns. One of them (ctid)
is unique by definition and can be used in cases when a primary key is missing.
DELETE FROM tablename a
USING tablename b
WHERE a.ctid < b.ctid
AND a.user_id = b.user_id
AND a.time_id = b.time_id;
The problem is due to lack of primary key. Using hidden columns should not be a systematic method (see comments below). Once you delete duplicates you should create a primary key on (user_id, time_id)
or create a new unique column for this purpose.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With