I am trying a simple <code>UPDATE table SET column1 = 0</code> on a table with about 3 million rows on Postegres 8.4 but it is taking forever to finish. It has been running for more than 10 min. Before, I tried to run a VACUUM and ANALYZE commands on that table and I also tried to create some indexes (although I doubt this will make any difference in this case) but none seems to help. Any other ideas? Update: This is the table structure: <pre class="prettyprint"><code>CREATE TABLE myTable ( id bigserial NOT NULL, title text, description text, link text, "type" character varying(255), generalFreq real, generalWeight real, author_id bigint, status_id bigint, CONSTRAINT resources_pkey PRIMARY KEY (id), CONSTRAINT author_pkey FOREIGN KEY (author_id) REFERENCES users (id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION, CONSTRAINT c_unique_status_id UNIQUE (status_id) ); </code></pre> I am trying to run <code>UPDATE myTable SET generalFreq = 0;</code>

I have to update tables of 1 or 2 billion rows with various values for each rows. Each run makes ~100 millions changes (10%). My first try was to group them in transaction of 300K updates directly on a specific partition as Postgresql not always optimize prepared queries if you use partitions. <ol> <li>Transactions of bunch of "UPDATE myTable SET myField=value WHERE myId=id" Gives 1,500 updates/sec. which means each run would take at least 18 hours.</li> <li>HOT updates solution as described here with FILLFACTOR=50. Gives 1,600 updates/sec. I use SSD's so it's a costly improvement as it doubles the storage size.</li> <li>Insert in a temporary table of updated value and merge them after with UPDATE...FROM Gives 18,000 updates/sec. if I do a VACUUM for each partition; 100,000 up/s otherwise. Cooool. Here is the sequence of operations:</li> </ol> <hr> <pre class="prettyprint"><code>CREATE TEMP TABLE tempTable (id BIGINT NOT NULL, field(s) to be updated, CONSTRAINT tempTable_pkey PRIMARY KEY (id)); </code></pre> Accumulate a bunch of updates in a buffer depending of available RAM When it's filled, or need to change of table/partition, or completed: <pre class="prettyprint"><code>COPY tempTable FROM buffer; UPDATE myTable a SET field(s)=value(s) FROM tempTable b WHERE a.id=b.id; COMMIT; TRUNCATE TABLE tempTable; VACUUM FULL ANALYZE myTable; </code></pre> That means a run now takes 1.5h instead of 18h for 100 millions updates, vacuum included. To save time, it's not necessary to make a vacuum FULL at the end but even a fast regular vacuum is usefull to control your transaction ID on the database and not get unwanted autovacuum during rush hours.

Take a look at this answer: PostgreSQL slow on a large table with arrays and lots of updates First start with a better FILLFACTOR, do a VACUUM FULL to force table rewrite and check the HOT-updates after your UPDATE-query: <pre class="prettyprint"><code>SELECT n_tup_hot_upd, * FROM pg_stat_user_tables WHERE relname = 'myTable'; </code></pre> HOT updates are much faster when you have a lot of records to update. More information about HOT can be found in this article. Ps. You need version 8.3 or better.

After waiting 35 min. for my UPDATE query to finish (and still didn't) I decided to try something different. So what I did was a command: <pre class="prettyprint"><code>CREATE TABLE table2 AS SELECT all the fields of table1 except the one I wanted to update, 0 as theFieldToUpdate from myTable </code></pre> Then add indexes, then drop the old table and rename the new one to take its place. That took only 1.7 min. to process plus some extra time to recreate the indexes and constraints. But it did help! :) Of course that did work only because nobody else was using the database. I would need to lock the table first if this was in a production environment.

Slow simple update query on PostgreSQL database with 3 million rows

Tags:

sql

postgresql

sql-update

I am trying a simple UPDATE table SET column1 = 0 on a table with about 3 million rows on Postegres 8.4 but it is taking forever to finish. It has been running for more than 10 min.

Before, I tried to run a VACUUM and ANALYZE commands on that table and I also tried to create some indexes (although I doubt this will make any difference in this case) but none seems to help.

Any other ideas?

Update:

This is the table structure:

CREATE TABLE myTable
(
  id bigserial NOT NULL,
  title text,
  description text,
  link text,
  "type" character varying(255),
  generalFreq real,
  generalWeight real,
  author_id bigint,
  status_id bigint,
  CONSTRAINT resources_pkey PRIMARY KEY (id),
  CONSTRAINT author_pkey FOREIGN KEY (author_id)
      REFERENCES users (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT c_unique_status_id UNIQUE (status_id)
);

I am trying to run UPDATE myTable SET generalFreq = 0;

800

asked Jul 29 '10 10:07

Ricardo

Video Answer

3 Answers

I have to update tables of 1 or 2 billion rows with various values for each rows. Each run makes ~100 millions changes (10%). My first try was to group them in transaction of 300K updates directly on a specific partition as Postgresql not always optimize prepared queries if you use partitions.

Transactions of bunch of "UPDATE myTable SET myField=value WHERE myId=id"
Gives 1,500 updates/sec. which means each run would take at least 18 hours.
HOT updates solution as described here with FILLFACTOR=50. Gives 1,600 updates/sec. I use SSD's so it's a costly improvement as it doubles the storage size.
Insert in a temporary table of updated value and merge them after with UPDATE...FROM Gives 18,000 updates/sec. if I do a VACUUM for each partition; 100,000 up/s otherwise. Cooool.
Here is the sequence of operations:

CREATE TEMP TABLE tempTable (id BIGINT NOT NULL, field(s) to be updated, CONSTRAINT tempTable_pkey PRIMARY KEY (id));

Accumulate a bunch of updates in a buffer depending of available RAM When it's filled, or need to change of table/partition, or completed:

COPY tempTable FROM buffer; UPDATE myTable a SET field(s)=value(s) FROM tempTable b WHERE a.id=b.id; COMMIT; TRUNCATE TABLE tempTable; VACUUM FULL ANALYZE myTable;

That means a run now takes 1.5h instead of 18h for 100 millions updates, vacuum included. To save time, it's not necessary to make a vacuum FULL at the end but even a fast regular vacuum is usefull to control your transaction ID on the database and not get unwanted autovacuum during rush hours.

137

answered Oct 12 '22 13:10

Le Droid

Take a look at this answer: PostgreSQL slow on a large table with arrays and lots of updates

First start with a better FILLFACTOR, do a VACUUM FULL to force table rewrite and check the HOT-updates after your UPDATE-query:

SELECT n_tup_hot_upd, * FROM pg_stat_user_tables WHERE relname = 'myTable';

HOT updates are much faster when you have a lot of records to update. More information about HOT can be found in this article.

Ps. You need version 8.3 or better.

answered Oct 12 '22 12:10

Frank Heikens

After waiting 35 min. for my UPDATE query to finish (and still didn't) I decided to try something different. So what I did was a command:

CREATE TABLE table2 AS 
SELECT 
  all the fields of table1 except the one I wanted to update, 0 as theFieldToUpdate
from myTable

Then add indexes, then drop the old table and rename the new one to take its place. That took only 1.7 min. to process plus some extra time to recreate the indexes and constraints. But it did help! :)

Of course that did work only because nobody else was using the database. I would need to lock the table first if this was in a production environment.

answered Oct 12 '22 11:10

Ricardo

Related questions
                            
                                Can someone explain this SQL injection attack to me?
                            
                                ORA-06502: PL/SQL: numeric or value error: character string buffer too small
                            
                                How to compare data between two table in different databases using Sql Server 2008?
                            
                                Add an incremental number in a field in INSERT INTO SELECT query in SQL Server
                            
                                Control flow in T-SQL SP using IF..ELSE IF - are there other ways?
                            
                                MySQL LIKE query with underscore
                            
                                Doing DateTime Comparisons in Filter SQLAlchemy
                            
                                Optimizing Delete on SQL Server
                            
                                Check if current date is between two dates Oracle SQL
                            
                                SqlConnection.Close() inside using statement
                            
                                Return a value if no record is found
                            
                                Mixing explicit and implicit joins fails with "There is an entry for table ... but it cannot be referenced from this part of the query"
                            
                                Selecting distinct combinations
                            
                                SQL Database Design Best Practice (Addresses)
                            
                                Why Stored Procedure is faster than Query
                            
                                How do you OR two LIKE statements?
                            
                                SQL Server 2005 and temporary table scope
                            
                                With Entity Framework is it better to use .First() or .Take(1) for "TOP 1"?
                            
                                SQL Server : How to test if a string has only digit characters
                            
                                How does a Recursive CTE run, line by line?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Slow simple update query on PostgreSQL database with 3 million rows

Tags:

sql

postgresql

sql-update

Ricardo

People also ask

Video Answer

3 Answers

Le Droid

Frank Heikens

Ricardo

Recent Activity

Donate For Us