UPDATE: I eliminated Hibernate from the problem. I completely reworked description of problem to simplify it as much as possible.
I have master
table with noop trigger and detail
table with two relations between master
and detail
table:
create table detail (
id bigint not null,
code varchar(255) not null,
primary key (id)
);
create table master (
id bigint not null,
name varchar(255),
detail_id bigint, -- "preferred" detail is one-to-one relation
primary key (id),
unique (detail_id),
foreign key (detail_id) references detail(id)
);
create table detail_candidate ( -- "candidate" details = many-to-many relation modeled as join table
master_id bigint not null,
detail_id bigint not null,
primary key (master_id, detail_id),
foreign key (detail_id) references detail(id),
foreign key (master_id) references master(id)
);
create or replace function trgf() returns trigger as $$
begin
return NEW;
end;
$$ language 'plpgsql';
create trigger trg
before insert or update
on master
for each row execute procedure trgf();
insert into master (id, name) values (1000, 'x'); -- this is part of database setup
insert into detail (code, id) values ('a', 1); -- this is part of database setup
In such setup, I open two terminal windows with psql
and perform following steps:
begin;
update master set detail_id=null, name='y' where id=1000;
begin;
set statement_timeout = 4000;
insert into detail_candidate (master_id, detail_id) values (1000, 1);
Last command in second terminal timeouts with message
ERROR: canceling statement due to statement timeout
CONTEXT: while locking tuple (0,1) in relation "master"
SQL statement "SELECT 1 FROM ONLY "public"."master" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x"
My observation and questions (changes are independent):
drop trigger trg on master;
is called after initial setup, everything works fine.
Why the presence of noop trigger has such an influence? I don't get it.master.detail_id
(i.e. alter table master drop constraint master_detail_id_key;
is called after initial setup), everything works fine too. Why?detail=null
assignment in update statement in first terminal (since there's null value from setup anyway), everything works fine too. Why?Tried on Postgres 9.6.12 (embedded), 9.6.15 (in Docker), 11.5 (in Docker).
Problem is reproducible in Docker image tomaszalusky/trig-example
which is available on DockerHub or can be built from this Dockerfile (instructions inside).
UPDATE 2: I found common behaviour of three observation above. I spawned the query select * from pgrowlocks('master')
from pgrowlocks extension in second transaction. The row-level lock of updated row in master
is FOR UPDATE
in failing case but FOR NO KEY UPDATE
in all three working cases. This is in perfect compliance with mode match table in documentation since FOR UPDATE
mode is the stronger one and mode requested by insert statement is FOR KEY SHARE
(which is apparent from error message, also invoking the select ... for key share
command has same effect as insert
command).
The documentation on FOR UPDATE
mode says:
The FOR UPDATE lock mode is also acquired by (...) an UPDATE that modifies the values on certain columns. Currently, the set of columns considered for the UPDATE case are those that have a unique index on them that can be used in a foreign key (...)
It is true for master.detail_id
column. However, still it's not clear why FOR UPDATE
mode isn't chosen independently on trigger presence and why trigger presence caused it.
The first thing to do is to look at the postgres logs. postgres documentation recommends avoiding these kinds of deadlocks by ensuring transactions acquire locks in a consistent order. As before postgres will lock rows in the update table and the locks will be held until the transaction commits or rolls back.
There is no LOCK TABLE in the SQL standard, which instead uses SET TRANSACTION to specify concurrency levels on transactions. PostgreSQL supports that too; see SET TRANSACTION for details.
An "advisory lock" is simply a tool/API provided by Postgres to create arbitrary locks that can be acquired by applications. These locks, however, are not enforced in any meaningful way by the database -- it's up to application code to give them meaning (the same way any other non-database distributed lock would work).
Mostly what happens when you try to UPDATE is that Postgres will acquire a lock on the row that you want to change. If you have two update statements running at the same time on the same row, then the second must wait for the first to process.
Interesting problem. This is my best guess. I have tested none of it.
Generally speaking, postgres's educated guessing of what effect statements will have on data does not extend into trigger logic. When executing the second statement, postgres sees the foreign key constraint, and knows it has to check if the value being assigned (inserted) is valid, that is, if it represents a valid key in the foreign table. It is possible, however bad practice, that the trigger may have an effect on the validity of the foreign key being proposed (e.g. if the trigger deletes records).
(case 1) If there is no trigger, then it can look at the data (both pre-commit and staged for commit) and decide if the proposed value is gauranteed valid. (case 2) If there is no FK constraint, then the trigger cannot impact the validity of the insertion, so it is allowed. (case 3) If you omit the detail_id=null
, there is no change in the update so the trigger won't fire, so its presence is irrelevant.
I try to avoid both FK constraints and triggers whenever possible. It's better, in my opinion, to let the database accidentally contain partially incorrect data then to have it hang completely, like you're seeing here. I would drop all FK constraints and triggers, and force all update and insert operations to operate via stored functions, which perform validation inside a begin/commit lock, and handle incorrect/invalid insert/update attempts appropriately and immediately, rather than forcing postgres to wait for command 1 to commit before deciding if command 2 is allowed.
Edit: see this question
Edit 2: The closest thing that I can find to official documentation around the timing of triggers relative to the checking of constraints is this from the triggers docs
The trigger can be specified to fire before the operation is attempted on a row (before constraints are checked and the INSERT, UPDATE, or DELETE is attempted); or after the operation has completed (after constraints are checked and the INSERT, UPDATE, or DELETE has completed); or instead of the operation (in the case of inserts, updates or deletes on a view). If the trigger fires before or instead of the event, the trigger can skip the operation for the current row, or change the row being inserted (for INSERT and UPDATE operations only).
This is a bit unclear, if the trigger happening before the constraint check applies to constraint check of other transactions. Whatever the case, this issue is either a bug or poorly documented.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With