I have a table with a lot of records (could be more than 500 000 or 1 000 000). I added a new column in this table and I need to fill a value for every row in the column, using the corresponding row value of another column in this table.
I tried to use separate transactions for selecting every next chunk of 100 records and update the value for them, but still this takes hours to update all records in Oracle10 for example.
What is the most efficient way to do this in SQL, without using some dialect-specific features, so it works everywhere (Oracle, MSSQL, MySQL, PostGre etc.)?
ADDITIONAL INFO: There are no calculated fields. There are indexes. Used generated SQL statements which update the table row by row.
There are a couple of ways to do it. INSERT INTO students (id, score1, score2) VALUES (1, 5, 8), (2, 10, 8), (3, 8, 3), (4, 10, 7) ON DUPLICATE KEY UPDATE score1 = VALUES(score1), score2 = VALUES(score2);
To optimize update operations, we should keep the transaction size as short as possible. We need to consider the lock escalation mode of the modified table to minimize the usage of too many resources. Analyzing the execution plan may help to resolve performance bottlenecks of the update query.
The usual way is to use UPDATE:
UPDATE mytable SET new_column = <expr containing old_column>
You should be able to do this is a single transaction.
As Marcelo suggests:
UPDATE mytable SET new_column = <expr containing old_column>;
If this takes too long and fails due to "snapshot too old" errors (e.g. if the expression queries another highly-active table), and if the new value for the column is always NOT NULL, you could update the table in batches:
UPDATE mytable SET new_column = <expr containing old_column> WHERE new_column IS NULL AND ROWNUM <= 100000;
Just run this statement, COMMIT, then run it again; rinse, repeat until it reports "0 rows updated". It'll take longer but each update is less likely to fail.
EDIT:
A better alternative that should be more efficient is to use the DBMS_PARALLEL_EXECUTE
API.
Sample code (from Oracle docs):
DECLARE l_sql_stmt VARCHAR2(1000); l_try NUMBER; l_status NUMBER; BEGIN -- Create the TASK DBMS_PARALLEL_EXECUTE.CREATE_TASK ('mytask'); -- Chunk the table by ROWID DBMS_PARALLEL_EXECUTE.CREATE_CHUNKS_BY_ROWID('mytask', 'HR', 'EMPLOYEES', true, 100); -- Execute the DML in parallel l_sql_stmt := 'update EMPLOYEES e SET e.salary = e.salary + 10 WHERE rowid BETWEEN :start_id AND :end_id'; DBMS_PARALLEL_EXECUTE.RUN_TASK('mytask', l_sql_stmt, DBMS_SQL.NATIVE, parallel_level => 10); -- If there is an error, RESUME it for at most 2 times. l_try := 0; l_status := DBMS_PARALLEL_EXECUTE.TASK_STATUS('mytask'); WHILE(l_try < 2 and l_status != DBMS_PARALLEL_EXECUTE.FINISHED) LOOP l_try := l_try + 1; DBMS_PARALLEL_EXECUTE.RESUME_TASK('mytask'); l_status := DBMS_PARALLEL_EXECUTE.TASK_STATUS('mytask'); END LOOP; -- Done with processing; drop the task DBMS_PARALLEL_EXECUTE.DROP_TASK('mytask'); END; /
Oracle Docs: https://docs.oracle.com/database/121/ARPLS/d_parallel_ex.htm#ARPLS67333
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With