Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgres hangs on large table when trying to update multiple rows with a CASE statement

I'm trying to update multiple rows using postgres, I'm using this code:

UPDATE foobar SET column_a = CASE
  WHEN column_b = '123' THEN 1
  WHEN column_b = '345' THEN 2
END;

If I create a new table this works flawlessly, but when running this on a large table with 8 million rows this hangs indefinitely. I've tried first in Admineer (web interface) and also in console.

However this works just fine:

UPDATE foobar SET column_a=1 WHERE column_b='123';

I'm hesitant to implement this approach into my code since I'll have thousands of updates at once and would prefer to put them in one statement. Any ideas as to why the first example would hang postgres and the second would work just fine? I just double checked and I have no rules applied on the table.

like image 372
newUserNameHere Avatar asked Sep 14 '13 03:09

newUserNameHere


People also ask

How big is too big for a PostgreSQL table?

PostgreSQL normally stores its table data in chunks of 8KB. The number of these blocks is limited to a 32-bit signed integer (just over two billion), giving a maximum table size of 16TB.

Does update statement lock the table Postgres?

Mostly what happens when you try to UPDATE is that Postgres will acquire a lock on the row that you want to change. If you have two update statements running at the same time on the same row, then the second must wait for the first to process.


1 Answers

The problem was ..

The statement:

CASE
  WHEN column_b = '123' THEN 1
  WHEN column_b = '345' THEN 2
END;

.. is just short for:

CASE
  WHEN column_b = '123' THEN 1
  WHEN column_b = '345' THEN 2
  ELSE NULL
END

Meaning, without a WHERE clause, your UPDATE statement is not just "trying", it actually updates every single row in the table, most of them to NULL.
Maybe, a NOT NULL constraint on the column prevented data loss ...

The better solution is ..

I'll have thousands of updates at once and would prefer to put them in one statement.

Much faster (and shorter) for large sets:

UPDATE foobar f
SET    column_a = val.a
FROM  (
   VALUES
     (123, 1)
    ,(345, 2)
   ) val(b, a)
WHERE f.column_b = val.b

Joining to a set beats iterating through a long list of CASE branches for every row easily. The difference will grow rapidly with longer lists.

Also, be sure to have an index on column_b either way.

You can replace the VALUES expression with any table, view or subselect yielding appropriate rows.

Note:
I am assuming that column_a and column_b are of type integer. In this case, the single quotes around '123' in your question were never helpful. You'd better use a numeric literal instead of a string literal. (Even though it works with string literals, too.)

A string literal like '123' defaults to unknown type.
A numeric literal like 123 defaults to integer - or bigint / numeric if the number is too big.

If you were dealing with non-default data types, you would have to cast explicitly. Would look like:

...
FROM  (
   VALUES
     ('123'::sometype, '1'::sometype)  -- first row defines row type
    ,('345', '2')
   ) val(b, a)
...
like image 118
Erwin Brandstetter Avatar answered Oct 13 '22 18:10

Erwin Brandstetter