Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Updating database rows without locking the table in PostgreSQL 9.2

Trying to run an update statement like this on a table, using PostgreSQL 9.2:

UPDATE table
    SET a_col = array[col];

We need to be able to run this on a ~10M row table, and not have it lock up the table (so normal operations can still happen while the update is running). I believe using a cursor will probably be the right solution, but I really have no idea if it is or how I should implement it using a cursor.

I have come up with this cursor code, which I think might be good.

Edit: Added cursor function

CREATE OR REPLACE FUNCTION update_fields() RETURNS VOID AS $$
DECLARE
        cursor CURSOR FOR SELECT * FROM table ORDER BY id FOR UPDATE;
BEGIN
        FOR row IN cursor LOOP
                UPDATE table SET
                        a_col = array[col],
                        a_col2= array[col2]
                WHERE CURRENT OF cursor;
        END LOOP;
END;
$$ LANGUAGE plpgsql;
like image 894
Juan Carlos Coto Avatar asked Apr 02 '13 17:04

Juan Carlos Coto


People also ask

How do I stop table locking in Postgres?

1: Never add a column with a default value Adding a column takes a very aggressive lock on the table, which blocks read and write. If you add a column with a default, PostgreSQL will rewrite the whole table to fill in the default for every row, which can take hours on large tables.

Does Postgres lock row for update?

PostgreSQL doesn't remember any information about modified rows in memory, so there is no limit on the number of rows locked at one time. However, locking a row might cause a disk write, e.g., SELECT FOR UPDATE modifies selected rows to mark them locked, and so will result in disk writes.

Do Postgres transactions lock the table?

There is no LOCK TABLE in the SQL standard, which instead uses SET TRANSACTION to specify concurrency levels on transactions. PostgreSQL supports that too; see SET TRANSACTION for details.

Does update statement lock the table?

Update lock does lock entire table's all rows.


1 Answers

MVCC

First off, if "normal operations" consist of SELECT queries, the MVCC model will take care of it automatically. UPDATE does not block SELECT and vice versa. SELECT only sees committed data (or what's been done in the same transaction), so the result of the big UPDATE remains invisible to other transactions until it's done (committed).

Performance / bloat

If you don't have other objects referencing that table,
and you don't have concurrent write operations (which would be lost!),
and you can afford a very short exclusive lock on the table,
and you have the additional disk space, of course:
You could keep the locking to a minimum by creating an updated version of the table in the background. Make sure it has everything to be a drop-in replacement, then drop the original and rename the dupe.

CREATE TABLE tbl_new (LIKE tbl_org INCLUDING CONSTRAINTS);

INSERT INTO tbl_new 
SELECT col_a, col_b, array[col] aS col_c
FROM   tbl_org;

I am using CREATE TABLE (LIKE .. INCLUDING CONSTRAINTS), because (quoting the manual here):

Not-null constraints are always copied to the new table. CHECK constraints will only be copied if INCLUDING CONSTRAINTS is specified; other types of constraints will never be copied.

Make sure, the new table is ready. Then:

DROP tbl_org;
ALTER TABLE tbl_new RENAME TO tbl_org;

Results in an very short time window, where the table is locked exclusively.

This is really only about performance. It creates a new table without any bloat rather quickly. If you have foreign keys or views, you can still go that route, but you have to prepare a script to drop and recreate these objects, potentially creating additional exclusive locks.

Concurrent writes

With concurrent write operations, really all you can do, is split your update in chunks. You can't do that in a single transaction, since locks are only released at the end of a transaction.

You could employ dblink, which can launch independent transactions on another database, including itself. This way you could do it all in a single DO statement or a plpgsql function with a loop. Here is a loosely related answer with more information on dblink:

  • Drop or create database from stored procedure in PostgreSQL

Your approach with cursors

A cursor inside the function will not buy you anything. Any function is enclosed in a transaction automatically, and all locks are only released at the end of the transaction. Even if you used CLOSE cursor (which you don't) it would only free some resources, but not release acquired locks on the table. I quote the manual:

CLOSE closes the portal underlying an open cursor. This can be used to release resources earlier than end of transaction, or to free up the cursor variable to be opened again.

You would need to run separate transactions or (ab)use dblink which does that for you.

like image 179
Erwin Brandstetter Avatar answered Oct 13 '22 04:10

Erwin Brandstetter