We have encountered a very peculiar issue with our production system. Unfortunately despite a lot of effort, I have not been able to reproduce the issue locally, so I cannot provide a minimal, complete and verifiable example. Also, as this is production code, I have had to change the names of the tables in the following example. However I believe I am presenting all the relevant facts.
We have four tables bucket_holder
, bucket
, item
and bucket_total
created as follows:
CREATE TABLE bucket_holder (
id SERIAL PRIMARY KEY,
bucket_holder_uid UUID NOT NULL
);
CREATE TABLE bucket (
id SERIAL PRIMARY KEY,
bucket_uid UUID NOT NULL,
bucket_holder_id INTEGER NOT NULL REFERENCES bucket_holder (id),
default_bucket BOOLEAN NOT NULL
);
CREATE TABLE item (
id SERIAL PRIMARY KEY,
item_uid UUID NOT NULL,
bucket_id INTEGER NOT NULL REFERENCES bucket (id),
amount NUMERIC NOT NULL
);
CREATE TABLE bucket_total (
bucket_id INTEGER NOT NULL REFERENCES bucket (id),
amount NUMERIC NOT NULL
);
There are also indexes on appropriate columns as follows:
CREATE UNIQUE INDEX idx1 ON bucket_holder (bucket_holder_uid);
CREATE UNIQUE INDEX idx2 ON bucket (bucket_uid);
CREATE UNIQUE INDEX idx3 ON item (item_uid);
CREATE UNIQUE INDEX idx4 ON bucket_total (bucket_id);
The idea is that a bucket_holder
holds bucket
s, one of which is a default_bucket
, bucket
s hold item
s and each bucket
has a unique bucket_total
record containing the sum of the amounts of all the item
s.
We are trying to do bulk inserts into the item
table as follows:
WITH
unnested AS (
SELECT *
FROM UNNEST(
ARRAY['00000000-0000-0000-0000-00000000001a', '00000000-0000-0000-0000-00000000002a']::UUID[],
ARRAY['00000000-0000-0000-0000-00000000001c', '00000000-0000-0000-0000-00000000002c']::UUID[],
ARRAY[1.11, 2.22]::NUMERIC[]
)
AS T(bucket_holder_uid, item_uid, amount)
),
inserted_item AS (
INSERT INTO item (bucket_id, item_uid, amount)
SELECT bucket.id, unnested.item_uid, unnested.amount
FROM unnested
JOIN bucket_holder ON unnested.bucket_holder_uid = bucket_holder.bucket_holder_uid
JOIN bucket ON bucket.bucket_holder_id = bucket_holder.id
JOIN bucket_total ON bucket_total.bucket_id = bucket.id
WHERE bucket.default_bucket
FOR UPDATE OF bucket_total
ON CONFLICT DO NOTHING
RETURNING bucket_id, amount
),
total_for_bucket AS (
SELECT bucket_id, SUM(amount) AS total
FROM inserted_item
GROUP BY bucket_id
)
UPDATE bucket_total
SET amount = amount + total_for_bucket.total
FROM total_for_bucket
WHERE bucket_total.bucket_id = total_for_bucket.bucket_id
In reality the arrays passed in are dynamic and have length up to 1000, but all 3 arrays have the same length. The arrays are always sorted so that the bucket_holder_uids
are in order in order to ensure that deadlock cannot occur. The point of the ON CONFLICT DO NOTHING
is that we should be able to handle the situation where some of the item
s were already present (the conflict is on item_uid
). In this case the bucket_total
should of course not be updated.
This query assumes that appropriate bucket_holder
, bucket
and bucket_total
records already exist. It is ok for the query to fail otherwise as in practice this situation will not occur. Here is an example of setting up some sample data:
INSERT INTO bucket_holder (bucket_holder_uid) VALUES ('00000000-0000-0000-0000-00000000001a');
INSERT INTO bucket (bucket_uid, bucket_holder_id, default_bucket) VALUES ('00000000-0000-0000-0000-00000000001b', (SELECT id FROM bucket_holder WHERE bucket_holder_uid = '00000000-0000-0000-0000-00000000001a'), TRUE);
INSERT INTO bucket_total (bucket_id, amount) VALUES ((SELECT id FROM bucket WHERE bucket_uid = '00000000-0000-0000-0000-00000000001b'), 0);
INSERT INTO bucket_holder (bucket_holder_uid) VALUES ('00000000-0000-0000-0000-00000000002a');
INSERT INTO bucket (bucket_uid, bucket_holder_id, default_bucket) VALUES ('00000000-0000-0000-0000-00000000002b', (SELECT id FROM bucket_holder WHERE bucket_holder_uid = '00000000-0000-0000-0000-00000000002a'), TRUE);
INSERT INTO bucket_total (bucket_id, amount) VALUES ((SELECT id FROM bucket WHERE bucket_uid = '00000000-0000-0000-0000-00000000002b'), 0);
This query appears to have done the correct thing for hundreds of thousands of item
s, but for a handful of item
s, the bucket_total
has been updated by twice the amount of the item
. I don't know if it's been updated twice or if it was updated once by twice the amount of the item
. However in these cases, only one item
has been inserted (inserting twice would be impossible anyway as there is a uniqueness constraint on item_uid
). Our logs suggest that for the affected bucket
s, two threads were executing the query simultaneously.
Can anyone see and explain any issue with this query and indicate how it could be rewritten?
We are using version PG9.6.6
UPDATE
We've spoken to a core postgres developer about this, who apparently doesn't see a concurrency issue here. We're now investigating really nasty possibilities such as index corruption, or the (remote) chance of a pg bug.
The INSERT ON CONFLICT statement allows you to update an existing row that contains a primary key when you execute the INSERT statement to insert a new row that contains the same primary key. This feature is also known as UPSERT or INSERT OVERWRITE. It is similar to the REPLACE INTO statement of MySQL.
You must have INSERT privilege on a table in order to insert into it. If ON CONFLICT DO UPDATE is present, UPDATE privilege on the table is also required. If a column list is specified, you only need INSERT privilege on the listed columns.
The actual implementation within PostgreSQL uses the INSERT command with a special ON CONFLICT clause to specify what to do if the record already exists within the table. You can specify whether you want the record to be updated if it's found in the table already or silently skipped.
NOT EXISTS will still attempt to insert duplicates if these exists in the source (tables), and fail. ON CONFLICT IGNORE will insist (and succeed) on inserting/updating them, in an undefined order.
Some thoughts while waiting for more data
Based on the problem you have, sounds like either inserted_items CTE is returning dups or the update statement somehow got executed twice. Both sounds weird, probably pg bug? Maybe try to simplify query as much as possible
some ideas: Looks like you put items to some default bucket first. It doesn't make much sense to have join to bucket table in this case (1 to many join). Why not just have default bucket id in holder table (or have separate cte for that)
that line doesn't seem to do anything: JOIN bucket_total ON bucket_total.bucket_id = bucket.id
Probably it would be sufficient just to insert data into item table. Why not have bucket_total as a view (like select bucket_id, sum(amount) ... from items ...) If it takes a while to populate maybe have it as a materialized view or reporting table. Or if you run that script many times during the day, probably create a trigger on item table to add/subtract 1 to bucket on insert/delete
assuming that you can reduce your query to something like this:
WITH
unnested AS (....),
INSERT INTO item (bucket_id, item_uid, amount)
SELECT bucket_holder2.dflt_bucket_id, unnested.item_uid, unnested.amount
FROM unnested
JOIN bucket_holder2 ON unnested.bucket_holder_uid = bucket_holder2.bucket_holder_uid
ON CONFLICT DO NOTHING
update Tried to run those queries on 9.6 and it worked fine. So I'd think there is no issue with query and pg, probably it's time to recreate table/database. Another idea for testing - you can try to change "UPDATE" to "INSERT" for bucket_total update, removing current unique key and creating incremental primary key. This way you can catch/repair double insertions (if that's the case)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With