Trying to run an update statement like this on a table, using PostgreSQL 9.2:
UPDATE table
SET a_col = array[col];
We need to be able to run this on a ~10M row table, and not have it lock up the table (so normal operations can still happen while the update is running). I believe using a cursor will probably be the right solution, but I really have no idea if it is or how I should implement it using a cursor.
I have come up with this cursor code, which I think might be good.
Edit: Added cursor function
CREATE OR REPLACE FUNCTION update_fields() RETURNS VOID AS $$ DECLARE cursor CURSOR FOR SELECT * FROM table ORDER BY id FOR UPDATE; BEGIN FOR row IN cursor LOOP UPDATE table SET a_col = array[col], a_col2= array[col2] WHERE CURRENT OF cursor; END LOOP; END; $$ LANGUAGE plpgsql;
1: Never add a column with a default value Adding a column takes a very aggressive lock on the table, which blocks read and write. If you add a column with a default, PostgreSQL will rewrite the whole table to fill in the default for every row, which can take hours on large tables.
PostgreSQL doesn't remember any information about modified rows in memory, so there is no limit on the number of rows locked at one time. However, locking a row might cause a disk write, e.g., SELECT FOR UPDATE modifies selected rows to mark them locked, and so will result in disk writes.
There is no LOCK TABLE in the SQL standard, which instead uses SET TRANSACTION to specify concurrency levels on transactions. PostgreSQL supports that too; see SET TRANSACTION for details.
Update lock does lock entire table's all rows.
First off, if "normal operations" consist of SELECT
queries, the MVCC model will take care of it automatically. UPDATE
does not block SELECT
and vice versa. SELECT
only sees committed data (or what's been done in the same transaction), so the result of the big UPDATE
remains invisible to other transactions until it's done (committed).
If you don't have other objects referencing that table,
and you don't have concurrent write operations (which would be lost!),
and you can afford a very short exclusive lock on the table,
and you have the additional disk space, of course:
You could keep the locking to a minimum by creating an updated version of the table in the background. Make sure it has everything to be a drop-in replacement, then drop the original and rename the dupe.
CREATE TABLE tbl_new (LIKE tbl_org INCLUDING CONSTRAINTS);
INSERT INTO tbl_new
SELECT col_a, col_b, array[col] aS col_c
FROM tbl_org;
I am using CREATE TABLE (LIKE .. INCLUDING CONSTRAINTS)
, because (quoting the manual here):
Not-null constraints are always copied to the new table.
CHECK
constraints will only be copied ifINCLUDING CONSTRAINTS
is specified; other types of constraints will never be copied.
Make sure, the new table is ready. Then:
DROP tbl_org;
ALTER TABLE tbl_new RENAME TO tbl_org;
Results in an very short time window, where the table is locked exclusively.
This is really only about performance. It creates a new table without any bloat rather quickly. If you have foreign keys or views, you can still go that route, but you have to prepare a script to drop and recreate these objects, potentially creating additional exclusive locks.
With concurrent write operations, really all you can do, is split your update in chunks. You can't do that in a single transaction, since locks are only released at the end of a transaction.
You could employ dblink, which can launch independent transactions on another database, including itself. This way you could do it all in a single DO
statement or a plpgsql function with a loop. Here is a loosely related answer with more information on dblink:
A cursor inside the function will not buy you anything. Any function is enclosed in a transaction automatically, and all locks are only released at the end of the transaction.
Even if you used CLOSE cursor
(which you don't) it would only free some resources, but not release acquired locks on the table. I quote the manual:
CLOSE
closes the portal underlying an open cursor. This can be used to release resources earlier than end of transaction, or to free up the cursor variable to be opened again.
You would need to run separate transactions or (ab)use dblink which does that for you.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With