I will provide a simplified example of my problem.
I have two tables: reviews
and users
.
reviews
is updated with a bunch of reviews that users post. The process that fetches the reviews also returns information for the user that submitted it (and certain user data changes frequently).
I want to update users
whenever I update reviews
, in bulk, using COPY
. The issue arises for users
when the fetched data contains two or more reviews from the same user. If I do a simple INSERT ON CONFLICT
, I might end up with errors since and INSERT
statement cannot update the same row twice.
A SELECT DISTINCT
would solve that problem, but I also want to guarantee that I insert the latest data into the users
table. This is how I am doing it. Keep in mind I am doing this in bulk:
1. Create a temporary table so that we can COPY
to/from it.
CREATE TEMPORARY TABLE users_temp (
id uuid,
stat_1 integer,
stat_2 integer,
account_age_in_mins integer);
2. COPY
data into temporary table
COPY users_temp (
id,
stat_1,
stat_2,
account_age_in_mins) FROM STDIN CSV ENCODING 'utf-8';
3. Lock users
table and perform INSERT ON CONFLICT
LOCK TABLE users in EXCLUSIVE MODE;
INSERT INTO users SELECT DISTINCT ON (1)
users_temp.id,
users_temp.stat_1,
users_temp.stat_2,
users_temp.account_age_in_mins
FROM users_temp
ORDER BY 1, 4 DESC, 2, 3
ON CONFLICT (id) DO UPDATE
SET
stat_1 = EXCLUDED.stat_1,
stat_2 = EXCLUDED.stat_2,
account_age_in_mins = EXCLUDED.account_age_in_mins';
The reason I am doing a SELECT DISTINCT
and an ORDER BY
in step 3) is because I:
account_age_in_mins
.Is this the correct method to achieve my goal?
This is a very good approach. Maybe you can avoid the table-lock, when you lock only tuples you have in your temporary table. https://dba.stackexchange.com/questions/106121/locking-in-postgres-for-update-insert-combination
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With