I'm trying to update multiple rows using postgres, I'm using this code:
UPDATE foobar SET column_a = CASE
WHEN column_b = '123' THEN 1
WHEN column_b = '345' THEN 2
END;
If I create a new table this works flawlessly, but when running this on a large table with 8 million rows this hangs indefinitely. I've tried first in Admineer (web interface) and also in console.
However this works just fine:
UPDATE foobar SET column_a=1 WHERE column_b='123';
I'm hesitant to implement this approach into my code since I'll have thousands of updates at once and would prefer to put them in one statement. Any ideas as to why the first example would hang postgres and the second would work just fine? I just double checked and I have no rules applied on the table.
PostgreSQL normally stores its table data in chunks of 8KB. The number of these blocks is limited to a 32-bit signed integer (just over two billion), giving a maximum table size of 16TB.
Mostly what happens when you try to UPDATE is that Postgres will acquire a lock on the row that you want to change. If you have two update statements running at the same time on the same row, then the second must wait for the first to process.
The statement:
CASE
WHEN column_b = '123' THEN 1
WHEN column_b = '345' THEN 2
END;
.. is just short for:
CASE
WHEN column_b = '123' THEN 1
WHEN column_b = '345' THEN 2
ELSE NULL
END
Meaning, without a WHERE
clause, your UPDATE
statement is not just "trying", it actually updates every single row in the table, most of them to NULL
.
Maybe, a NOT NULL
constraint on the column prevented data loss ...
I'll have thousands of updates at once and would prefer to put them in one statement.
Much faster (and shorter) for large sets:
UPDATE foobar f
SET column_a = val.a
FROM (
VALUES
(123, 1)
,(345, 2)
) val(b, a)
WHERE f.column_b = val.b
Joining to a set beats iterating through a long list of CASE
branches for every row easily. The difference will grow rapidly with longer lists.
Also, be sure to have an index on column_b
either way.
You can replace the VALUES
expression with any table, view or subselect yielding appropriate rows.
Note:
I am assuming that column_a
and column_b
are of type integer
. In this case, the single quotes around '123'
in your question were never helpful. You'd better use a numeric literal instead of a string literal. (Even though it works with string literals, too.)
A string literal like '123'
defaults to unknown
type.
A numeric literal like 123
defaults to integer
- or bigint
/ numeric
if the number is too big.
If you were dealing with non-default data types, you would have to cast explicitly. Would look like:
...
FROM (
VALUES
('123'::sometype, '1'::sometype) -- first row defines row type
,('345', '2')
) val(b, a)
...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With