Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Oracle 11g - How to optimize slow parallel insert select?

we want to speed up the run of the parallel insert statement below. We are expecting to insert around 80M records and it is taking around 2 hours to finish.

INSERT /*+ PARALLEL(STAGING_EX,16) APPEND NOLOGGING */ INTO STAGING_EX (ID, TRAN_DT, 
RECON_DT_START, RECON_DT_END, RECON_CONFIG_ID, RECON_PM_ID) 
SELECT /*+PARALLEL(PM,16) */ SEQ_RESULT_ID.nextval, sysdate, sysdate, sysdate, 
'8a038312403e859201405245eed00c42', T1.ID FROM PM T1 WHERE STATUS = 1 and not 
exists(select 1 from RESULT where T1.ID = RECON_PM_ID and CREATE_DT >= sysdate - 60) and 
UPLOAD_DT >= sysdate - 1 and (FUND_SRC_TYPE = :1) 

We think that caching the results of the not exist column will speed up the inserts. How do we perform the caching? Any ideas how else to speed up the insert?

Please see below for plan statistics from Enterprise Manager. Also we noticed that the statements are not being run in parallel. Is this normal?

insert-select plan statistics from oracle em

Edit: btw, the sequence is already cached to 1M

like image 821
robertdi Avatar asked Nov 18 '13 12:11

robertdi


1 Answers

Improve statistics. The estimated number of rows is 1, but the actual number of rows is over 7 million and counting. This causes the execution plan to use a nested loop instead of a hash join. A nested loop works better for small amounts of data and a hash join works better for large amounts of data. Fixing that may be as easy as ensuring the relevant tables have accurate, current statistics. This can usually be done by gathering statistics with the default settings, for example: exec dbms_stats.gather_table_stats('SIRS_UATC1', 'TBL_RECON_PM');.

If that doesn't improve the cardinality estimate try using a dynamic sampling hint, such as /*+ dynamic_sampling(5) */. For such a long-running query it is worth spending a little extra time up-front sampling data if it leads to a better plan.

Use statement-level parallelism instead of object-level parallelism. This is probably the most common mistake with parallel SQL. If you use object-level parallelism the hint must reference the alias of the object. Since 11gR2 there is no need to worry about specifying objects. This statement only needs a single hint: INSERT /*+ PARALLEL(16) APPEND */ .... Note that NOLOGGING is not a real hint.

like image 102
Jon Heller Avatar answered Oct 14 '22 10:10

Jon Heller