Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Semantics of INSERT SELECT FOR UPDATE ON CONFLICT DO NOTHING RETURNING

We have encountered a very peculiar issue with our production system. Unfortunately despite a lot of effort, I have not been able to reproduce the issue locally, so I cannot provide a minimal, complete and verifiable example. Also, as this is production code, I have had to change the names of the tables in the following example. However I believe I am presenting all the relevant facts.

We have four tables bucket_holder, bucket, item and bucket_total created as follows:

CREATE TABLE bucket_holder (
  id SERIAL PRIMARY KEY,
  bucket_holder_uid UUID NOT NULL
);

CREATE TABLE bucket ( 
  id SERIAL PRIMARY KEY, 
  bucket_uid UUID NOT NULL, 
  bucket_holder_id INTEGER NOT NULL REFERENCES bucket_holder (id), 
  default_bucket BOOLEAN NOT NULL
);

CREATE TABLE item ( 
  id SERIAL PRIMARY KEY, 
  item_uid UUID NOT NULL, 
  bucket_id INTEGER NOT NULL REFERENCES bucket (id), 
  amount NUMERIC NOT NULL 
);

CREATE TABLE bucket_total ( 
  bucket_id INTEGER NOT NULL REFERENCES bucket (id), 
  amount NUMERIC NOT NULL 
);

There are also indexes on appropriate columns as follows:

CREATE UNIQUE INDEX idx1 ON bucket_holder (bucket_holder_uid);
CREATE UNIQUE INDEX idx2 ON bucket (bucket_uid);
CREATE UNIQUE INDEX idx3 ON item (item_uid);
CREATE UNIQUE INDEX idx4 ON bucket_total (bucket_id);

The idea is that a bucket_holder holds buckets, one of which is a default_bucket, buckets hold items and each bucket has a unique bucket_total record containing the sum of the amounts of all the items.

We are trying to do bulk inserts into the item table as follows:

WITH
unnested AS ( 
  SELECT * 
  FROM UNNEST(
    ARRAY['00000000-0000-0000-0000-00000000001a', '00000000-0000-0000-0000-00000000002a']::UUID[], 
    ARRAY['00000000-0000-0000-0000-00000000001c', '00000000-0000-0000-0000-00000000002c']::UUID[], 
    ARRAY[1.11, 2.22]::NUMERIC[]
  ) 
  AS T(bucket_holder_uid, item_uid, amount) 
), 
inserted_item AS ( 
  INSERT INTO item (bucket_id, item_uid, amount) 
  SELECT bucket.id, unnested.item_uid, unnested.amount 
  FROM unnested 
  JOIN bucket_holder ON unnested.bucket_holder_uid = bucket_holder.bucket_holder_uid 
  JOIN bucket ON bucket.bucket_holder_id = bucket_holder.id 
  JOIN bucket_total ON bucket_total.bucket_id = bucket.id 
  WHERE bucket.default_bucket 
  FOR UPDATE OF bucket_total 
  ON CONFLICT DO NOTHING 
  RETURNING bucket_id, amount 
), 
total_for_bucket AS ( 
  SELECT bucket_id, SUM(amount) AS total 
  FROM inserted_item 
  GROUP BY bucket_id 
) 
UPDATE bucket_total 
SET amount = amount + total_for_bucket.total 
FROM total_for_bucket 
WHERE bucket_total.bucket_id = total_for_bucket.bucket_id

In reality the arrays passed in are dynamic and have length up to 1000, but all 3 arrays have the same length. The arrays are always sorted so that the bucket_holder_uids are in order in order to ensure that deadlock cannot occur. The point of the ON CONFLICT DO NOTHING is that we should be able to handle the situation where some of the items were already present (the conflict is on item_uid). In this case the bucket_total should of course not be updated.

This query assumes that appropriate bucket_holder, bucket and bucket_total records already exist. It is ok for the query to fail otherwise as in practice this situation will not occur. Here is an example of setting up some sample data:

INSERT INTO bucket_holder (bucket_holder_uid) VALUES ('00000000-0000-0000-0000-00000000001a');
INSERT INTO bucket (bucket_uid, bucket_holder_id, default_bucket) VALUES ('00000000-0000-0000-0000-00000000001b', (SELECT id FROM bucket_holder WHERE bucket_holder_uid = '00000000-0000-0000-0000-00000000001a'), TRUE);
INSERT INTO bucket_total (bucket_id, amount) VALUES ((SELECT id FROM bucket WHERE bucket_uid = '00000000-0000-0000-0000-00000000001b'), 0);

INSERT INTO bucket_holder (bucket_holder_uid) VALUES ('00000000-0000-0000-0000-00000000002a');
INSERT INTO bucket (bucket_uid, bucket_holder_id, default_bucket) VALUES ('00000000-0000-0000-0000-00000000002b', (SELECT id FROM bucket_holder WHERE bucket_holder_uid = '00000000-0000-0000-0000-00000000002a'), TRUE);
INSERT INTO bucket_total (bucket_id, amount) VALUES ((SELECT id FROM bucket WHERE bucket_uid = '00000000-0000-0000-0000-00000000002b'), 0);

This query appears to have done the correct thing for hundreds of thousands of items, but for a handful of items, the bucket_total has been updated by twice the amount of the item. I don't know if it's been updated twice or if it was updated once by twice the amount of the item. However in these cases, only one item has been inserted (inserting twice would be impossible anyway as there is a uniqueness constraint on item_uid). Our logs suggest that for the affected buckets, two threads were executing the query simultaneously.

Can anyone see and explain any issue with this query and indicate how it could be rewritten?

We are using version PG9.6.6

UPDATE

We've spoken to a core postgres developer about this, who apparently doesn't see a concurrency issue here. We're now investigating really nasty possibilities such as index corruption, or the (remote) chance of a pg bug.

like image 554
cpp beginner Avatar asked Nov 03 '18 08:11

cpp beginner


People also ask

What is insert on conflict?

The INSERT ON CONFLICT statement allows you to update an existing row that contains a primary key when you execute the INSERT statement to insert a new row that contains the same primary key. This feature is also known as UPSERT or INSERT OVERWRITE. It is similar to the REPLACE INTO statement of MySQL.

What is needed for an insert on conflict update to work?

You must have INSERT privilege on a table in order to insert into it. If ON CONFLICT DO UPDATE is present, UPDATE privilege on the table is also required. If a column list is specified, you only need INSERT privilege on the listed columns.

How to use ON conflict IN PostgreSQL?

The actual implementation within PostgreSQL uses the INSERT command with a special ON CONFLICT clause to specify what to do if the record already exists within the table. You can specify whether you want the record to be updated if it's found in the table already or silently skipped.

Where Not Exists VS on conflict?

NOT EXISTS will still attempt to insert duplicates if these exists in the source (tables), and fail. ON CONFLICT IGNORE will insist (and succeed) on inserting/updating them, in an undefined order.


1 Answers

Some thoughts while waiting for more data

Based on the problem you have, sounds like either inserted_items CTE is returning dups or the update statement somehow got executed twice. Both sounds weird, probably pg bug? Maybe try to simplify query as much as possible

some ideas: Looks like you put items to some default bucket first. It doesn't make much sense to have join to bucket table in this case (1 to many join). Why not just have default bucket id in holder table (or have separate cte for that)

that line doesn't seem to do anything: JOIN bucket_total ON bucket_total.bucket_id = bucket.id

Probably it would be sufficient just to insert data into item table. Why not have bucket_total as a view (like select bucket_id, sum(amount) ... from items ...) If it takes a while to populate maybe have it as a materialized view or reporting table. Or if you run that script many times during the day, probably create a trigger on item table to add/subtract 1 to bucket on insert/delete

assuming that you can reduce your query to something like this:

WITH
unnested AS (....), 

  INSERT INTO item (bucket_id, item_uid, amount) 
  SELECT bucket_holder2.dflt_bucket_id, unnested.item_uid, unnested.amount 
  FROM unnested 
  JOIN bucket_holder2 ON unnested.bucket_holder_uid = bucket_holder2.bucket_holder_uid 
   ON CONFLICT DO NOTHING 

update Tried to run those queries on 9.6 and it worked fine. So I'd think there is no issue with query and pg, probably it's time to recreate table/database. Another idea for testing - you can try to change "UPDATE" to "INSERT" for bucket_total update, removing current unique key and creating incremental primary key. This way you can catch/repair double insertions (if that's the case)

like image 66
Mike Twc Avatar answered Oct 05 '22 22:10

Mike Twc