How to speed up a slow UPDATE query

Tags:

postgresql

I have the following UPDATE query:

UPDATE Indexer.Pages SET LastError=NULL where LastError is not null;

Right now, this query takes about 93 minutes to complete. I'd like to find ways to make this a bit faster.

The Indexer.Pages table has around 506,000 rows, and about 490,000 of them contain a value for LastError, so I doubt I can take advantage of any indexes here.

The table (when uncompressed) has about 46 gigs of data in it, however the majority of that data is in a text field called html. I believe simply loading and unloading that many pages is causing the slowdown. One idea would be to make a new table with just the Id and the html field, and keep Indexer.Pages as small as possible. However, testing this theory would be a decent amount of work since I actually don't have the hard disk space to create a copy of the table. I'd have to copy it over to another machine, drop the table, then copy the data back which would probably take all evening.

Ideas? I'm using Postgres 9.0.0.

UPDATE:

Here's the schema:

CREATE TABLE indexer.pages
(
  id uuid NOT NULL,
  url character varying(1024) NOT NULL,
  firstcrawled timestamp with time zone NOT NULL,
  lastcrawled timestamp with time zone NOT NULL,
  recipeid uuid,
  html text NOT NULL,
  lasterror character varying(1024),
  missingings smallint,
  CONSTRAINT pages_pkey PRIMARY KEY (id ),
  CONSTRAINT indexer_pages_uniqueurl UNIQUE (url )
);

I also have two indexes:

CREATE INDEX idx_indexer_pages_missingings
  ON indexer.pages
  USING btree
  (missingings )
  WHERE missingings > 0;

and

CREATE INDEX idx_indexer_pages_null
  ON indexer.pages
  USING btree
  (recipeid )
  WHERE NULL::boolean;

There are no triggers on this table, and there is one other table that has a FK constraint on Pages.PageId.

723

asked Jun 18 '12 16:06

Mike Christensen

1 Answers

What @kgrittn posted as comment is the best answer so far. I am merely filling in details.

Before you do anything else, you should upgrade PostgreSQL to a current version, at least to the last security release of your major version. See guidelines on the project.

I also want to stress what Kevin mentioned about indexes involving the column LastError. Normally, HOT updates can recycle dead rows on a data page and make UPDATEs a lot faster - effectively removing (most of) the need for vacuuming. Related:

Redundant data in update statements

If your column is used in any index in any way, HOT UPDATEs are disabled, because it would break the index(es). If that is the case, you should be able to speed up the query a lot by deleting all of these indexes before you UPDATE and recreate them later.

In this context it would help to run multiple smaller UPDATEs: If ...
... the updated column is not involved in any indexes (enabling HOT updates). ... the UPDATE is easily divided into multiple patches in multiple transactions. ... the rows in those patches are spread out over the table (physically, not logically). ... there are no other concurrent transactions keeping dead tuples from being reused.

Then you would not need to VACCUUM in between multiple patches, because HOT updates can reuse dead tuples directly - only dead tuples from previous transactions, not from the same or concurrent ones. You may want to schedule a VACUUM at the end of the operation, or just let auto-vacuuming do its job.

The same could be done with any other index that is not needed for the UPDATE - and judging from your numbers the UPDATE is not going to use an index anyway. If you update large parts of your table, building new indexes from scratch is much faster than incrementally updating indexes with every changed row.

Also, your update is not likely to break any foreign key constraints. You could try to delete & recreate those, too. This does open a time slot where referential integrity would not be enforced. If the integrity is violated during the UPDATE you get an error when trying to recreate the FK. If you do it all within one transaction, concurrent transactions never get to see the dropped FK, but you take a write lock on the table - same as with dropping / recreating indexes or triggers)

Lastly, disable & enable triggers that are not needed for the update.

Be sure to do all of this in one transaction. Maybe do it in a number of smaller patches, so it does not block concurrent operations for too long.

So:

BEGIN;
ALTER TABLE tbl DISABLE TRIGGER user; -- disable all self-made triggers
-- DROP indexes (& fk constraints ?)
-- UPDATE ...
-- RECREATE indexes (& fk constraints ?)
ALTER TABLE tbl ENABLE TRIGGER user;
COMMIT;

You cannot run VACUUM inside a transaction block. Per documentation:

VACUUM cannot be executed inside a transaction block.

You could split your operation into a few big chunks and run in between:

VACUUM ANALYZE tbl;

If you don't have to deal with concurrent transactions you could (even more effectively):

ALTER TABLE tbl DISABLE TRIGGER user; -- disable all self-made triggers
-- DROP indexes (& fk constraints ?)

-- Multiple UPDATEs with logical slices of the table
-- each slice in its own transaction.
-- VACUUM ANALYZE tbl;  -- optionally in between, or autovacuum kicks in

-- RECREATE indexes (& fk constraints ?)
ALTER TABLE tbl ENABLE TRIGGER user;

111

answered Sep 28 '22 11:09

Erwin Brandstetter

Related questions
                            
                                MSSQL: Select rows with more than 2 occurrences in another table
                            
                                mySQL Indexes suggestion software
                            
                                Find a series of data using non-exact measurements (fuzzy logic)
                            
                                Function as parameter to another function in Postgres
                            
                                Timestamp for row creation and last modification
                            
                                Merging intervals in one pass in SQL
                            
                                Where is difference between using of mySQL operator BETWEEN and classic date interval? [duplicate]
                            
                                Searching for items in a many-to-many relationship
                            
                                Sql server temp table
                            
                                Merging XML in an SQL Server
                            
                                creating a new ODBC user DSN with Delphi
                            
                                SPARQL: Equivalent of SQL UPDATE?
                            
                                Get first/last n records per group by
                            
                                Why does my application hang while trying to close a SqlConnection object?
                            
                                EOF and BOF error when querying SQL database
                            
                                Inserting null values into date fields?
                            
                                What benefits MyBatis provides over working with plain JDBC
                            
                                SQL Query for a particular scenario
                            
                                Design of the 'model' for QTableView in PySide + SQLAlchemy
                            
                                Recurring Billing Database Design [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With