Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

reduced buffer cache hits for a query causes random performance issues

We have an UPDATE in production(below) which processes more or less the same number of rows each day but with drastically different runtimes. Some days the query finishes in 2 minutes, while other days, the query might take 20 minutes. Per my analysis of the AWR data, the culprit was I/O wait time and whenever the query slows down, the cache hit ratio goes down due to increased physical reads.

The outline of the query itself is below:

update /*+ nologging parallel ( a 12 )  */ huge_table1 a  
set   col = 1
where  col1 > 'A'
  and    col2 < 'B'
  and    exists ( select /*+ parallel ( b 12 ) */ 1
                from   huge_table2 b
                where  b.col3 = a.col3 );

huge_table1 and huge_table2 contains about 100 million rows and the execution statistics are below:

Day     EXECUTIONS ELAPSED_TIME_S_1EXEC CPU_TIME_S_1EXEC IOWAIT_S_1EXEC ROWS_PROCESSED_1EXEC BUFFER_GETS_1EXEC  DISK_READS_1EXEC DIRECT_WRITES_1EXEC
------- ----------- -------------------- ---------------- -------------- -------------------- ----------------- ----------------- -------------------
      1           1              133.055           69.110         23.325          2178085.000       3430367.000         90522.000           42561.000
      2           1              123.580           65.020         20.282          2179404.000       3341566.000         86614.000           38925.000
      3           1             1212.762           72.800       1105.084          1982658.000       3131695.000        268260.000           38446.000
      4           1             1085.773           59.600        996.642          1965309.000       2954480.000        200612.000           26790.000

As seen above, the LIO has remained almost the same in each case, although the elapsed time has increased in the 3rd and 4th days due to increased IO waits, which if my assumption is correct was caused by increase in the PIO. Per Tom Kyte, tuning should be focused on reducing the LIO instead of PIO and as LIO reduces, so will PIO. But in this case, the LIO has been constant throughout, but the PIO has been varying significantly.

My question - What tuning strategy could be adopted here?

like image 626
toddlermenot Avatar asked Sep 30 '22 05:09

toddlermenot


3 Answers

I would:

-> Check the execution plan for both cases. -> Check IO subsystem health. -> Monitor the server the time this runs and make sure the IO sybsystem is not saturated by another process.

Also, what kind of I/O is leading read events? Sequential, parallel , scattered?... here you can ge a lead of the strategy the plan is following to perform the update...

Is the buffer cache being resized? a small and cold buffer cache which gets resized during this big execution could lead to blocks needing to be read into the buffer cache in order to update them.

Some ideas based on the data you showed... please let us know what came out!

like image 175
bubooal Avatar answered Oct 17 '22 10:10

bubooal


Recently I had problem which huge update. I found good solution based on parallel pipelined function which decrease time of updating significantly. My proposition is not exactly what you asked but maybe this approach could give you short and stable time in days perspective:

  1. Create type:

    CREATE type test_num_arr AS TABLE of INTEGER;
    /
    
  2. Make updating pipelined function (you can ofcourse adjust):

    create or replace FUNCTION test_parallel_update (
    test_cur IN SYS_REFCURSOR
    ) 
    RETURN test_num_arr
    PARALLEL_ENABLE (PARTITION test_cur BY ANY)
    PIPELINED
    IS
    PRAGMA AUTONOMOUS_TRANSACTION;
    
    test_rec HUGE_TABLE1%ROWTYPE;
    TYPE num_tab_t IS TABLE OF NUMBER(38);
    
    pk_tab NUM_TAB_T;
    
    cnt INTEGER := 0;
    BEGIN
    LOOP
        FETCH test_cur BULK COLLECT INTO pk_tab LIMIT 1000;
        EXIT WHEN pk_tab.COUNT() = 0;
    
        FORALL i IN pk_tab.FIRST .. pk_tab.LAST
            UPDATE HUGE_TABLE1
            set   col = 1
            where  col1 > 'A'
            and    col2 < 'B'
            and    exists ( select 1
                            from   huge_table2 b
                            where  b.col3 = a.col3 
                           )
            AND ID = pk_tab(i);
    
        cnt := cnt + pk_tab.COUNT;
    END LOOP;
    
    CLOSE test_cur;
    
    COMMIT;
    PIPE ROW(cnt);
    RETURN;
    END;
    
  3. Lastly, run your update:

    SELECT * FROM TABLE(test_parallel_update(CURSOR(SELECT id FROM huge_table1)));
    

Approach based on: http://www.orafaq.com/node/2450

like image 1
rtbf Avatar answered Oct 17 '22 10:10

rtbf


To answer your question about the strategy, must of course choose LIO. Row access in buffer are much faster than disk operation.

With respect to your problem,seen that the first days the execution time is good and the last days it is not. If you use indexes on columns = b.col3 a.col3 and there is a lot of insertion in the tables.Maybe they are out of date and so your query can no longer use the index and reads more blocks. Because in your execution plan we see an increase in disk reads.

In this case it would be necessary to :

EXEC DBMS_STATS.gather_table_stats(schema, table_name);

You should gather statistics periodically with scheduler. depending on your data changing on volume.

You could schedule during the day just a gathers index statistics with :

DBMS_STATS.GATHER_INDEX_STATS

And evening :

DBMS_STATS.GATHER_TABLE_STATS

witch gathers table and column (and index) statistics.

In addition to your question about the possibilities there is also change to the data model. On large volumes partitioned tables are a good aproach in reducing IO.

hoping that ca can help

like image 1
Bertrand Ring Avatar answered Oct 17 '22 08:10

Bertrand Ring