Trying to run an update statement like this on a table, using PostgreSQL 9.2: <pre class="prettyprint"><code>UPDATE table SET a_col = array[col]; </code></pre> We need to be able to run this on a ~10M row table, and not have it lock up the table (so normal operations can still happen while the update is running). I believe using a cursor will probably be the right solution, but I really have no idea if it is or how I should implement it using a cursor. I have come up with this cursor code, which I think might be good. Edit: Added cursor function <pre class="prettyprint"> CREATE OR REPLACE FUNCTION update_fields() RETURNS VOID AS $$ DECLARE cursor CURSOR FOR SELECT * FROM table ORDER BY id FOR UPDATE; BEGIN FOR row IN cursor LOOP UPDATE table SET a_col = array[col], a_col2= array[col2] WHERE CURRENT OF cursor; END LOOP; END; $$ LANGUAGE plpgsql; </pre>

<h3>MVCC</h3> First off, if "normal operations" consist of <code>SELECT</code> queries, the MVCC model will take care of it automatically. <code>UPDATE</code> does not block <code>SELECT</code> and vice versa. <code>SELECT</code> only sees committed data (or what's been done in the same transaction), so the result of the big <code>UPDATE</code> remains invisible to other transactions until it's done (committed). <h3>Performance / bloat</h3> If you don't have other objects referencing that table, and you don't have concurrent write operations (which would be lost!), and you can afford a very short exclusive lock on the table, and you have the additional disk space, of course: You could keep the locking to a minimum by creating an updated version of the table in the background. Make sure it has everything to be a drop-in replacement, then drop the original and rename the dupe. <pre class="prettyprint"><code>CREATE TABLE tbl_new (LIKE tbl_org INCLUDING CONSTRAINTS); INSERT INTO tbl_new SELECT col_a, col_b, array[col] aS col_c FROM tbl_org; </code></pre> I am using <code>CREATE TABLE (LIKE .. INCLUDING CONSTRAINTS)</code>, because (quoting the manual here): <blockquote> Not-null constraints are always copied to the new table. <code>CHECK</code> constraints will only be copied if <code>INCLUDING CONSTRAINTS</code> is specified; other types of constraints will never be copied. </blockquote> Make sure, the new table is ready. Then: <pre class="prettyprint"><code>DROP tbl_org; ALTER TABLE tbl_new RENAME TO tbl_org; </code></pre> Results in an very short time window, where the table is locked exclusively. This is really only about performance. It creates a new table without any bloat rather quickly. If you have foreign keys or views, you can still go that route, but you have to prepare a script to drop and recreate these objects, potentially creating additional exclusive locks. <h3>Concurrent writes</h3> With concurrent write operations, really all you can do, is split your update in chunks. You can't do that in a single transaction, since locks are only released at the end of a transaction. You could employ dblink, which can launch independent transactions on another database, including itself. This way you could do it all in a single <code>DO</code> statement or a plpgsql function with a loop. Here is a loosely related answer with more information on dblink: <ul> <li>Drop or create database from stored procedure in PostgreSQL</li> </ul> <h3>Your approach with cursors</h3> A cursor inside the function will not buy you anything. Any function is enclosed in a transaction automatically, and all locks are only released at the end of the transaction. Even if you used <code>CLOSE cursor</code> (which you don't) it would only free some resources, but not release acquired locks on the table. I quote the manual: <blockquote> <code>CLOSE</code> closes the portal underlying an open cursor. This can be used to release resources earlier than end of transaction, or to free up the cursor variable to be opened again. </blockquote> You would need to run separate transactions or (ab)use dblink which does that for you.

Updating database rows without locking the table in PostgreSQL 9.2

Tags:

postgresql

sql-update

database-cursor

Trying to run an update statement like this on a table, using PostgreSQL 9.2:

UPDATE table
    SET a_col = array[col];

We need to be able to run this on a ~10M row table, and not have it lock up the table (so normal operations can still happen while the update is running). I believe using a cursor will probably be the right solution, but I really have no idea if it is or how I should implement it using a cursor.

I have come up with this cursor code, which I think might be good.

Edit: Added cursor function

CREATE OR REPLACE FUNCTION update_fields() RETURNS VOID AS $$
DECLARE
        cursor CURSOR FOR SELECT * FROM table ORDER BY id FOR UPDATE;
BEGIN
        FOR row IN cursor LOOP
                UPDATE table SET
                        a_col = array[col],
                        a_col2= array[col2]
                WHERE CURRENT OF cursor;
        END LOOP;
END;
$$ LANGUAGE plpgsql;

894

asked Apr 02 '13 17:04

Juan Carlos Coto

1 Answers

MVCC

First off, if "normal operations" consist of SELECT queries, the MVCC model will take care of it automatically. UPDATE does not block SELECT and vice versa. SELECT only sees committed data (or what's been done in the same transaction), so the result of the big UPDATE remains invisible to other transactions until it's done (committed).

Performance / bloat

If you don't have other objects referencing that table,
and you don't have concurrent write operations (which would be lost!),
and you can afford a very short exclusive lock on the table,
and you have the additional disk space, of course:
You could keep the locking to a minimum by creating an updated version of the table in the background. Make sure it has everything to be a drop-in replacement, then drop the original and rename the dupe.

CREATE TABLE tbl_new (LIKE tbl_org INCLUDING CONSTRAINTS);

INSERT INTO tbl_new 
SELECT col_a, col_b, array[col] aS col_c
FROM   tbl_org;

I am using CREATE TABLE (LIKE .. INCLUDING CONSTRAINTS), because (quoting the manual here):

Not-null constraints are always copied to the new table. CHECK constraints will only be copied if INCLUDING CONSTRAINTS is specified; other types of constraints will never be copied.

Make sure, the new table is ready. Then:

DROP tbl_org;
ALTER TABLE tbl_new RENAME TO tbl_org;

Results in an very short time window, where the table is locked exclusively.

This is really only about performance. It creates a new table without any bloat rather quickly. If you have foreign keys or views, you can still go that route, but you have to prepare a script to drop and recreate these objects, potentially creating additional exclusive locks.

Concurrent writes

With concurrent write operations, really all you can do, is split your update in chunks. You can't do that in a single transaction, since locks are only released at the end of a transaction.

You could employ dblink, which can launch independent transactions on another database, including itself. This way you could do it all in a single DO statement or a plpgsql function with a loop. Here is a loosely related answer with more information on dblink:

Drop or create database from stored procedure in PostgreSQL

Your approach with cursors

A cursor inside the function will not buy you anything. Any function is enclosed in a transaction automatically, and all locks are only released at the end of the transaction. Even if you used CLOSE cursor (which you don't) it would only free some resources, but not release acquired locks on the table. I quote the manual:

CLOSE closes the portal underlying an open cursor. This can be used to release resources earlier than end of transaction, or to free up the cursor variable to be opened again.

You would need to run separate transactions or (ab)use dblink which does that for you.

179

answered Oct 13 '22 04:10

Erwin Brandstetter

Related questions
                            
                                How do I order results by hstore attribute in Rails 4?
                            
                                Field default timestamp set to table creation time instead of row creation time
                            
                                Split string after every nth character
                            
                                How to pass SQL query to psql as a argument containing double quotes
                            
                                PostgreSQL - RETURNING INTO array
                            
                                Heroku rake command
                            
                                Inserting Analytic data from Spark to Postgres
                            
                                Understanding the virtualxid transaction type in postgres
                            
                                Geoserver ERROR: function postgis_lib_version()
                            
                                Docker - Rails app cannot connect to linked Postgres container (doesn't seem to be running)
                            
                                RowVersion implementation on Entity Framework for PostgreSQL
                            
                                Concatenate multiple arrays in Postgres
                            
                                AWS Glue - Truncate destination postgres table prior to insert
                            
                                knex insert multiple rows
                            
                                PostgresSQL Installation fails: "database cluster initialisation failed" MAC os
                            
                                Using an alias in a window function in a query in PostgreSQL
                            
                                Java Web Application for 5000~ Users
                            
                                Type conversion. What do I do with a PostgreSQL OID value in libpq in C?
                            
                                split out file name from path in postgres
                            
                                Rails 3 ignore Postgres unique constraint exception

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With