Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Speed of many individual updates vs. one large update

Tags:

sql

postgresql

I have a table with a primary id column (automatically indexed), two sub-id columns (also indexed), and 12 columns of type geometry(Polygon). If I am going to update ~2 million rows on this table, is it faster for me to run ~2 million individual update statements like

update TABLE_NAME set ( COLUMNS ) = ( VALUES ) where ID_COLUMN = NEXT_ID

or is it faster to do some smaller number of larger update statements like in this answer

update TABLE_NAME as update_t set
COLUMNS = new_vals.COLUMNS
from (values
(id, polygon1val, polygon2val, ... polygon12val),   /* row 1 */
(id, polygon1val, polygon2val, ... polygon12val),   /* row 2 */
...                                                 /* ...   */
(id, polygon1val, polygon2val, ... polygon12val)    /* row N */
) as new_vals( COLUMNS )
where new_vals.id = update_t.id

If the latter, do you have any suggestions on what a good N might be? Is N = ~2mil, or some smaller subset (that I would repeat until they're all done)?

EDIT: Obviously, in the former case I would use a prepared statement. But I also wonder, in the latter case, is there any benefit in trying to use a prepared statement?

I'm using PostgreSQL 9.2.

like image 807
caps Avatar asked Nov 14 '13 00:11

caps


People also ask

Does multiple updates in a table will decrease the performance?

3 Answers. Show activity on this post. The single UPDATE is faster. That is, multiple UPDATE turned out to be 5-6 times slower than single UPDATE .

How can I update 1 million records in SQL Server?

One of my favorite ways of dealing with millions of records in a table is processing inserts, deletes, or updates in batches. Updating data in batches of 10,000 records at a time and using a transaction is a simple and efficient way of performing updates on millions of records.


1 Answers

In general, the better you can batch operations into sets the more options the database has of making things fast. If you run the updates individually, the only option is something like "locate the one row affected, delete it, insert new one"

If you can batch the updates then the planner gets to decide whether a sequential scan may be faster than a bunch of index scans (and it may well be, since you get to leverage read-ahead caching). In other words, one command updating a lot of rows almost always performs better than a lot of commands updating a single row, even aside from planning overhead.

like image 115
Chris Travers Avatar answered Oct 27 '22 01:10

Chris Travers