Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is this the correct way to bulk INSERT ON CONFLICT in Postgres?

I will provide a simplified example of my problem.

I have two tables: reviews and users.

reviews is updated with a bunch of reviews that users post. The process that fetches the reviews also returns information for the user that submitted it (and certain user data changes frequently).

I want to update users whenever I update reviews, in bulk, using COPY. The issue arises for users when the fetched data contains two or more reviews from the same user. If I do a simple INSERT ON CONFLICT, I might end up with errors since and INSERT statement cannot update the same row twice.

A SELECT DISTINCT would solve that problem, but I also want to guarantee that I insert the latest data into the users table. This is how I am doing it. Keep in mind I am doing this in bulk:

1. Create a temporary table so that we can COPY to/from it.

CREATE TEMPORARY TABLE users_temp (
    id uuid, 
    stat_1 integer, 
    stat_2 integer, 
    account_age_in_mins integer);

2. COPY data into temporary table

COPY users_temp (
    id,
    stat_1,
    stat_2,
    account_age_in_mins) FROM STDIN CSV ENCODING 'utf-8';

3. Lock users table and perform INSERT ON CONFLICT

LOCK TABLE users in EXCLUSIVE MODE;

INSERT INTO users SELECT DISTINCT ON (1) 
    users_temp.id,
    users_temp.stat_1,
    users_temp.stat_2,
    users_temp.account_age_in_mins
FROM users_temp
ORDER BY 1, 4 DESC, 2, 3
ON CONFLICT (id) DO UPDATE
SET
    stat_1 = EXCLUDED.stat_1,
    stat_2 = EXCLUDED.stat_2,
    account_age_in_mins = EXCLUDED.account_age_in_mins';

The reason I am doing a SELECT DISTINCT and an ORDER BY in step 3) is because I:

  1. Only want to return one instance of the duplicated rows.
  2. From those duplicates make sure that I get the most up to date record by sorting on the account_age_in_mins.

Is this the correct method to achieve my goal?

like image 651
turnip Avatar asked Oct 16 '17 14:10

turnip


1 Answers

This is a very good approach. Maybe you can avoid the table-lock, when you lock only tuples you have in your temporary table. https://dba.stackexchange.com/questions/106121/locking-in-postgres-for-update-insert-combination

like image 199
maletin Avatar answered Oct 06 '22 02:10

maletin