We have an UPDATE in production(below) which processes more or less the same number of rows each day but with drastically different runtimes. Some days the query finishes in 2 minutes, while other days, the query might take 20 minutes. Per my analysis of the AWR data, the culprit was I/O wait time and whenever the query slows down, the cache hit ratio goes down due to increased physical reads.
The outline of the query itself is below:
update /*+ nologging parallel ( a 12 ) */ huge_table1 a
set col = 1
where col1 > 'A'
and col2 < 'B'
and exists ( select /*+ parallel ( b 12 ) */ 1
from huge_table2 b
where b.col3 = a.col3 );
huge_table1 and huge_table2 contains about 100 million rows and the execution statistics are below:
Day EXECUTIONS ELAPSED_TIME_S_1EXEC CPU_TIME_S_1EXEC IOWAIT_S_1EXEC ROWS_PROCESSED_1EXEC BUFFER_GETS_1EXEC DISK_READS_1EXEC DIRECT_WRITES_1EXEC
------- ----------- -------------------- ---------------- -------------- -------------------- ----------------- ----------------- -------------------
1 1 133.055 69.110 23.325 2178085.000 3430367.000 90522.000 42561.000
2 1 123.580 65.020 20.282 2179404.000 3341566.000 86614.000 38925.000
3 1 1212.762 72.800 1105.084 1982658.000 3131695.000 268260.000 38446.000
4 1 1085.773 59.600 996.642 1965309.000 2954480.000 200612.000 26790.000
As seen above, the LIO has remained almost the same in each case, although the elapsed time has increased in the 3rd and 4th days due to increased IO waits, which if my assumption is correct was caused by increase in the PIO. Per Tom Kyte, tuning should be focused on reducing the LIO instead of PIO and as LIO reduces, so will PIO. But in this case, the LIO has been constant throughout, but the PIO has been varying significantly.
My question - What tuning strategy could be adopted here?
I would:
-> Check the execution plan for both cases. -> Check IO subsystem health. -> Monitor the server the time this runs and make sure the IO sybsystem is not saturated by another process.
Also, what kind of I/O is leading read events? Sequential, parallel , scattered?... here you can ge a lead of the strategy the plan is following to perform the update...
Is the buffer cache being resized? a small and cold buffer cache which gets resized during this big execution could lead to blocks needing to be read into the buffer cache in order to update them.
Some ideas based on the data you showed... please let us know what came out!
Recently I had problem which huge update. I found good solution based on parallel pipelined function which decrease time of updating significantly. My proposition is not exactly what you asked but maybe this approach could give you short and stable time in days perspective:
Create type:
CREATE type test_num_arr AS TABLE of INTEGER;
/
Make updating pipelined function (you can ofcourse adjust):
create or replace FUNCTION test_parallel_update (
test_cur IN SYS_REFCURSOR
)
RETURN test_num_arr
PARALLEL_ENABLE (PARTITION test_cur BY ANY)
PIPELINED
IS
PRAGMA AUTONOMOUS_TRANSACTION;
test_rec HUGE_TABLE1%ROWTYPE;
TYPE num_tab_t IS TABLE OF NUMBER(38);
pk_tab NUM_TAB_T;
cnt INTEGER := 0;
BEGIN
LOOP
FETCH test_cur BULK COLLECT INTO pk_tab LIMIT 1000;
EXIT WHEN pk_tab.COUNT() = 0;
FORALL i IN pk_tab.FIRST .. pk_tab.LAST
UPDATE HUGE_TABLE1
set col = 1
where col1 > 'A'
and col2 < 'B'
and exists ( select 1
from huge_table2 b
where b.col3 = a.col3
)
AND ID = pk_tab(i);
cnt := cnt + pk_tab.COUNT;
END LOOP;
CLOSE test_cur;
COMMIT;
PIPE ROW(cnt);
RETURN;
END;
Lastly, run your update:
SELECT * FROM TABLE(test_parallel_update(CURSOR(SELECT id FROM huge_table1)));
Approach based on: http://www.orafaq.com/node/2450
To answer your question about the strategy, must of course choose LIO. Row access in buffer are much faster than disk operation.
With respect to your problem,seen that the first days the execution time is good and the last days it is not. If you use indexes on columns = b.col3 a.col3 and there is a lot of insertion in the tables.Maybe they are out of date and so your query can no longer use the index and reads more blocks. Because in your execution plan we see an increase in disk reads.
In this case it would be necessary to :
EXEC DBMS_STATS.gather_table_stats(schema, table_name);
You should gather statistics periodically with scheduler. depending on your data changing on volume.
You could schedule during the day just a gathers index statistics with :
DBMS_STATS.GATHER_INDEX_STATS
And evening :
DBMS_STATS.GATHER_TABLE_STATS
witch gathers table and column (and index) statistics.
In addition to your question about the possibilities there is also change to the data model. On large volumes partitioned tables are a good aproach in reducing IO.
hoping that ca can help
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With