Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgres UPDATE with ORDER BY, how to do it?

Tags:

I need to do a Postgres update on a collection of records & I'm trying to prevent a deadlock which appeared in the stress tests.

The typical resolution to this is to update records in a certain order, by ID for example - but it seems that Postgres doesn't allow ORDER BY for UPDATE.

Assuming I need to do an update, for example:

UPDATE BALANCES WHERE ID IN (SELECT ID FROM some_function() ORDER BY ID); 

results in deadlocks when you run 200 queries concurrently. What to do?

I'm looking for a general solution, not case-specific workarounds like in UPDATE with ORDER BY

It feels that there must be a better solution than writing a cursor function. Also, if there's no better way, how would that cursor function optimally look like? Update record-by-record

like image 576
bbozo Avatar asked Jun 20 '17 18:06

bbozo


People also ask

Can we use ORDER BY in update query in PostgreSQL?

The typical resolution to this is to update records in a certain order, by ID for example - but it seems that Postgres doesn't allow ORDER BY for UPDATE. Assuming I need to do an update, for example: UPDATE BALANCES WHERE ID IN (SELECT ID FROM some_function() ORDER BY ID);

How does PostgreSQL ORDER BY work?

When you query data from a table, the SELECT statement returns rows in an unspecified order. To sort the rows of the result set, you use the ORDER BY clause in the SELECT statement. The ORDER BY clause allows you to sort rows returned by a SELECT clause in ascending or descending order based on a sort expression.

How updates work in Postgres?

PostgreSQL implements multiversioning by keeping the old version of the table row in the table – an UPDATE adds a new row version (“tuple”) of the row and marks the old version as invalid. In many respects, an UPDATE in PostgreSQL is not much different from a DELETE followed by an INSERT .


1 Answers

As far as I know, there's no way to accomplish this directly through the UPDATE statement; the only way to guarantee lock order is to explicitly acquire locks with a SELECT ... ORDER BY ID FOR UPDATE, e.g.:

UPDATE Balances SET Balance = 0 WHERE ID IN (   SELECT ID FROM Balances   WHERE ID IN (SELECT ID FROM some_function())   ORDER BY ID   FOR UPDATE ) 

This has the downside of repeating the ID index lookup on the Balances table. In your simple example, you can avoid this overhead by fetching the physical row address (represented by the ctid system column) during the locking query, and using that to drive the UPDATE:

UPDATE Balances SET Balance = 0 WHERE ctid = ANY(ARRAY(   SELECT ctid FROM Balances   WHERE ID IN (SELECT ID FROM some_function())   ORDER BY ID   FOR UPDATE )) 

(Be careful when using ctids, as the values are transient. We're safe here, as the locks will block any changes.)

Unfortunately, the planner will only utilise the ctid in a narrow set of cases (you can tell if it's working by looking for a "Tid Scan" node in the EXPLAIN output). To handle more complicated queries within a single UPDATE statement, e.g. if your new balance was being returned by some_function() alongside the ID, you'll need to fall back to the ID-based lookup:

UPDATE Balances SET Balance = Locks.NewBalance FROM (   SELECT Balances.ID, some_function.NewBalance   FROM Balances   JOIN some_function() ON some_function.ID = Balances.ID   ORDER BY Balances.ID   FOR UPDATE ) Locks WHERE Balances.ID = Locks.ID 

If the performance overhead is an issue, you'd need to resort to using a cursor, which would look something like this:

DO $$ DECLARE   c CURSOR FOR     SELECT Balances.ID, some_function.NewBalance     FROM Balances     JOIN some_function() ON some_function.ID = Balances.ID     ORDER BY Balances.ID     FOR UPDATE; BEGIN   FOR row IN c LOOP     UPDATE Balances     SET Balance = row.NewBalance     WHERE CURRENT OF c;   END LOOP; END $$ 
like image 187
Nick Barnes Avatar answered Sep 19 '22 13:09

Nick Barnes