Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient way to update all rows in a table

I have a table with a lot of records (could be more than 500 000 or 1 000 000). I added a new column in this table and I need to fill a value for every row in the column, using the corresponding row value of another column in this table.

I tried to use separate transactions for selecting every next chunk of 100 records and update the value for them, but still this takes hours to update all records in Oracle10 for example.

What is the most efficient way to do this in SQL, without using some dialect-specific features, so it works everywhere (Oracle, MSSQL, MySQL, PostGre etc.)?

ADDITIONAL INFO: There are no calculated fields. There are indexes. Used generated SQL statements which update the table row by row.

like image 686
m_pGladiator Avatar asked Apr 14 '10 07:04

m_pGladiator


People also ask

How can we update multiple rows at a time into the table?

There are a couple of ways to do it. INSERT INTO students (id, score1, score2) VALUES (1, 5, 8), (2, 10, 8), (3, 8, 3), (4, 10, 7) ON DUPLICATE KEY UPDATE score1 = VALUES(score1), score2 = VALUES(score2);

How do you optimize an update query?

To optimize update operations, we should keep the transaction size as short as possible. We need to consider the lock escalation mode of the modified table to minimize the usage of too many resources. Analyzing the execution plan may help to resolve performance bottlenecks of the update query.


2 Answers

The usual way is to use UPDATE:

UPDATE mytable    SET new_column = <expr containing old_column> 

You should be able to do this is a single transaction.

like image 198
Marcelo Cantos Avatar answered Oct 09 '22 10:10

Marcelo Cantos


As Marcelo suggests:

UPDATE mytable SET new_column = <expr containing old_column>; 

If this takes too long and fails due to "snapshot too old" errors (e.g. if the expression queries another highly-active table), and if the new value for the column is always NOT NULL, you could update the table in batches:

UPDATE mytable SET new_column = <expr containing old_column> WHERE new_column IS NULL AND ROWNUM <= 100000; 

Just run this statement, COMMIT, then run it again; rinse, repeat until it reports "0 rows updated". It'll take longer but each update is less likely to fail.

EDIT:

A better alternative that should be more efficient is to use the DBMS_PARALLEL_EXECUTE API.

Sample code (from Oracle docs):

DECLARE   l_sql_stmt VARCHAR2(1000);   l_try NUMBER;   l_status NUMBER; BEGIN    -- Create the TASK   DBMS_PARALLEL_EXECUTE.CREATE_TASK ('mytask');    -- Chunk the table by ROWID   DBMS_PARALLEL_EXECUTE.CREATE_CHUNKS_BY_ROWID('mytask', 'HR', 'EMPLOYEES', true, 100);    -- Execute the DML in parallel   l_sql_stmt := 'update EMPLOYEES e        SET e.salary = e.salary + 10       WHERE rowid BETWEEN :start_id AND :end_id';   DBMS_PARALLEL_EXECUTE.RUN_TASK('mytask', l_sql_stmt, DBMS_SQL.NATIVE,                                  parallel_level => 10);    -- If there is an error, RESUME it for at most 2 times.   l_try := 0;   l_status := DBMS_PARALLEL_EXECUTE.TASK_STATUS('mytask');   WHILE(l_try < 2 and l_status != DBMS_PARALLEL_EXECUTE.FINISHED)    LOOP     l_try := l_try + 1;     DBMS_PARALLEL_EXECUTE.RESUME_TASK('mytask');     l_status := DBMS_PARALLEL_EXECUTE.TASK_STATUS('mytask');   END LOOP;    -- Done with processing; drop the task   DBMS_PARALLEL_EXECUTE.DROP_TASK('mytask');  END; / 

Oracle Docs: https://docs.oracle.com/database/121/ARPLS/d_parallel_ex.htm#ARPLS67333

like image 33
Jeffrey Kemp Avatar answered Oct 09 '22 10:10

Jeffrey Kemp