I'm working on Postgres 9.2.
There are 2 UPDATEs, each in their own transactions. One looks like:
UPDATE foo SET a=1 WHERE b IN (1,2,3,4);
The other is similar:
UPDATE foo SET a=2 WHERE b IN (1,2,3,4);
These could possibly run at the same time and in reality have 500+ in the 'IN' expression. I'm sometimes seeing deadlocks. Is is true that that order of items in the 'IN' expression may not actually influence the true lock ordering?
There is no ORDER BY
in the UPDATE
command.
But there is for SELECT
. Use row-level locking with the FOR UPDATE
clause in a subquery:
UPDATE foo f
SET a = 1
FROM (
SELECT b FROM foo
WHERE b IN (1,2,3,4)
ORDER BY b
FOR UPDATE
) upd
WHERE f.b = upd.b;
Of course, b
has to be UNIQUE
or you need to add more expressions to the ORDER BY
clause to make it unambiguous.
And you need to enforce the same order for all UPDATE
, DELETE
and SELECT .. FOR UPDATE
statements on the table.
Related, with more details:
Yes. I think the main issue here is that IN
checks for membership in the set specified, but does not confer any sort of ordering on the UPDATE
, which in turn, means that no concrete ordering is conferred upon the lock ordering.
The WHERE
clause in an UPDATE
statement essentially behaves the same way it does in a SELECT
. For example, I will often simulate an UPDATE
using a SELECT
to check what will be updated to see that it's what I expected.
With that in mind, the following example using SELECT
demonstrates that IN
does not in itself confer ordering:
Given this schema/data:
create table foo
(
id serial,
val text
);
insert into foo (val)
values ('one'), ('two'), ('three'), ('four');
The following queries:
select *
from foo
where id in (1,2,3,4);
select *
from foo
where id in (4,3,2,1);
yield the exact same results -- the rows in order from id
1-4.
Even that isn't guaranteed, since I did not use an ORDER BY
in the select. Rather, without it, Postgres uses whatever order the server decides is fastest (see point 8 about ORDER BY
in the Postgres SELECT doc). Given a fairly static table, it's often the same order in which it was inserted (as was the case here). However, there's nothing guaranteeing that, and if there's a lot of churn on the table (lots of dead tuples, rows removed, etc.), it's less likely to be the case.
I suspect that's what's happening here with your UPDATE
. Sometimes -- if not even most of the time -- it may end up in numerical order if that's the same way the rows were inserted, but there's nothing to guarantee that, and the cases where you see the deadlocks are likely scenarios where the data has changed such that one update is ordered different than the other.
sqlfiddle with the above code.
Possible fixes/workarounds:
In terms of what you could do about it, there are various options, depending on your requirements. You could explicitly take out a table lock on the table, although that would of course have the effect of serializing the updates there, which may prove to be too large a bottleneck.
Another option, which would still allow for concurrency -- is to explicitly iterate over the items using dynamic SQL in, say, Python. That way, you'd have a set of one-row updates that occurred always in the same order, and since you could ensure that consistent order, the normal Postgres locking should be able to handle the concurrency without deadlocking.
That won't perform as well as batch-updating in pure SQL, but it should solve the lock issue. One suggestion to bump up performance is to only COMMIT
every so often, and not after every single row -- that saves a lot of overhead.
Another option would be to do the loop in a Postgres function written in PL/pgSQL. That function could then be called externally, in, say, Python, but the looping would be done (also explicitly) server-side, which may save on some overhead, since the looping and UPDATEs
are done entirely server-side without having to go over the wire each loop iteration.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With