Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How should I approach updating a "hot" table?

Tags:

postgresql

I have a postgres DB with a "hot" table, meaning it is receiving many updates + deletes every second. I would like to run a query that will UPDATE about 10% of its rows.

I do not have an index on the 10% criteria and the isolation level is set to SERIALIZABLE as a DB flag (affecting all transactions). I would like this query to succeed, even if it temporarily blocks other ongoing updates. This is a one-time data migration and I won't need an index on this criteria once the transaction completes. The query will run for several minutes.

It seems I have a few paths forward, and I'm trying to determine which would be best:

  1. I could take out an explicit table lock, e.g.:
BEGIN;
LOCK TABLE hotness IN SHARE MODE;
SELECT * FROM hotness WHERE ...; -- 10% of rows affected
-- make some maybe-complex business logic decisions
UPDATE hotness SET x = some_result WHERE ...;
COMMIT;

But I wonder if that approach would block more concurrent manipulations than is strictly necessary.

  1. I could create a temporary index in the txn:
BEGIN;
CREATE INDEX foo ON hotness (...);
SELECT * FROM hotness WHERE ...; -- 10% of rows affected
-- make some maybe-complex business logic decisions
UPDATE hotness SET x = some_result WHERE ...;
DROP INDEX foo;
COMMIT;

This seemed appealing at first since the index would allow only certain updates to conflict (thanks to predicate locks), but it seems CREATE INDEX itself takes out a SHARE mode lock, so it seems to be strictly worse (table lock + overhead of btree creation).

  1. I could use explicit row-level locks, e.g.:
BEGIN;
SELECT * FROM hotness WHERE ... FOR NO KEY UPDATE; -- 10% of rows affected
-- make some maybe-complex business logic decisions
UPDATE hotness SET x = some_result WHERE ...;
COMMIT;

but without an index on this criteria, I think the planner would execute a SeqScan, which would effectively row-lock the entire table, which also seems worse than option 1.

  1. I could rely on the SERIALIZABLE isolation level and retry again and again, hoping to catch a window where we happen to be the "winning" txn and don't throw a serialization exception.

In my experiments, this doesn't seem at all feasible, thanks to how many concurrent inserts + updates are happening in this table while my query is running.

Any thoughts on which approach is best?

like image 306
Jake Biesinger Avatar asked Oct 31 '25 07:10

Jake Biesinger


1 Answers

As you noticed yourself, options 2. and 3. are not useful. Only 1. and 4. look promising.

If the table is really hot, I think that the only feasible option will be an explicit table lock. SHARE mode would be sufficient to prevent deadlocks, but there could still be serialization errors if concurrent transactions read the rows you want to modify. If you want to be certain that no conflicts can occur, you have to take an ACCESS EXCLUSIVE lock on the table. Both a SHARE and an ACCESS EXCLUSIVE lock on the table are problematic, because they will cancel concurrent autovacuum runs. So only consider this solution if this update of 10% of the table does not occur often.

To see if the less invasive method 4. could work, you'd have to experiment. It may well be that your transaction never succeeds on a hot table.

like image 127
Laurenz Albe Avatar answered Nov 02 '25 00:11

Laurenz Albe



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!