Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why can't I seem to force Oracle 11g to consume more CPUs for a single SQL query

I have some humongous queries that run on gigantic tables. These queries seem to be CPU bottlenecked, and run for hours. I understand that Oracle has a lot of new features with 11g, release 2 to internally paralellize the execution of a query. However no matter what kinds of hints I put in the query, I cant seem to use more than 1 CPU on the database box. I have a very respectable Solaris machine with 8 CPUs, however everytime I run this query, I end up just pushing one CPU to 100% and then sit there for hours.

The hints I have tried are:

SELECT /*+ PARALLEL */ ...
SELECT /*+ PARALLEL(5) */ ...
SELECT /*+ PARALLEL(10) */ ...

None of that appeared to work when looking at overall CPU consumption on the box. It always seemed to peg one CPU at 100%. Unfortunately even explain plan seems to take forever to run. I will try and get different explain plans with the different hints and see if that helps. Is it possible that some queries are simply un-paralleable, even if their runtime is in the hours?!!? The main table in this query has 335 million rows.

SQL Query Text:

http://pastie.org/8634380

System Parameters:

http://pastie.org/8634383

Edit:

Detailed Explain Plan - No Parallelism:

http://pastebin.com/HkZgbPpf

Optimizer related system parameters:

http://pastie.org/8639841

Further EDIT: We have reached out to Oracle to understand why EXPLAIN PLAN takes more than 2 hours. We are timing out trying to run the various explain plans.

like image 265
NG Algo Avatar asked Dec 05 '22 07:12

NG Algo


2 Answers

The most important thing to understand about Oracle parallelism is that it's complicated. Optimizing parallelism requires a lot of Oracle knowledge, reading the manuals, checking many parameters, testing long-running queries, and a lot of skepticism.

Ask the Right Questions

Parallel problems really involve three different questions:

  1. How many parallel servers were requested?
  2. How many parallel servers were allocated?
  3. How many parallel servers were meaningfully used?

Use the Best Tools

Go straight to the best tool - SQL Monitoring with active reports. Find your SQL_ID and generate the HTML report: select dbms_sqltune.report_sql_monitor(sql_id => 'your_sql_id', type => 'active') from dual;. This is the only way to know how much time was spent on each step in the execution plan. And it will tell you how much parallelism was effectively used, and where. For example: enter image description here

Another good options is type => 'text'. It doesn't have quite as much information but it's a quicker to look at and easier to share.

SQL Monitoring also includes the DOP requested and the DOP allocated: enter image description here

A 100-line parallel select may run beautifully, but then everything halts at a single step because of an uncached sequence. You can stare at an explain plan, a trace, or an AWR report for hours and not see the problem. The active report makes the slow steps almost trivial to find. Do not waste time guessing where the problem lies.

However, other tools are still required. An explain plan generated with explain plan for ... and select * from table(dbms_xplan.display); will provide a few key pieces of information. Specifically the Notes section can include many reasons why the query did not request parallelism.

But WHY did I get that number of parallel servers?

The relevant information is spread over several different manuals, which are very useful but occasionally inaccurate or misleading. There are many myths and much bad advice about parallelism. And the technology changes significantly with each release.

When you put together all of the reputable sources, the list of factors influencing the number of parallel servers is astonishingly large. The list below is ordered roughly by what I think are the most important factors:

  1. Inter-operation parallelism Any query using sorting or grouping will allocate twice as many parallel servers as the DOP. This is probably responsible for the myth "Oracle allocates as many parallel servers as possible!".
  2. Query hint Preferably a statement-level hint like /*+ parallel */, or possibly an object-level hint like /*+ noparallel(table1) */. If a specific step of a plan is running in serial it is usually because of object-level hints on only part of the query.
  3. Recursive SQL Some operations may run in parallel but can be effectively serialized by recursive SQL. For example, an uncached sequence on a large insert. Recursive SQL generated to parse the statement will also be serial; for example dynamic sampling queries.
  4. Alter session alter session [force|enable] parallel [query|dml|ddl]; Note that parallel DML is disabled by default.
  5. Table degree
  6. Index degree
  7. Index was cheaper Parallel hints only tell the optimizer to consider a full table scan with a certain DOP. They do not actually force parallelism. The optimizer is still free to use a serial index-access if it think it's cheaper. (The FULL hint may help solve this issue.)
  8. Plan management SQL Plan Baselines, outlines, profiles, advanced rewrite, and SQL Translators can all change the degree of parallelism behind your back. Check the Note section of the plan.
  9. Edition Only Enterprise and Personal Editions allow parallel operations. Except for the package DBMS_PARALLEL_EXECUTE.
  10. PARALLEL_ADAPTIVE_MULTI_USER
  11. PARALLEL_AUTOMATIC_TUNING
  12. PARALLEL_DEGREE_LIMIT
  13. PARALLEL_DEGREE_POLICY
  14. PARALLEL_FORCE_LOCAL
  15. PARALLEL_INSTANCE_GROUP
  16. PARALLEL_IO_CAP_ENABLED
  17. PARALLEL_MAX_SERVERS This is the upper limit for the whole system. There's a trade-off here. Running too many parallel servers at once is bad for the system. But downgrading a query to serial can be disastrous for some queries.
  18. PARALLEL_MIN_PERCENT
  19. PARALLEL_MIN_SERVERS
  20. PARALLEL_MIN_TIME_THRESHOLD
  21. PARALLEL_SERVERS_TARGET
  22. PARALLEL_THREADS_PER_CPU
  23. Number of RAC nodes Another multiplier for default DOP.
  24. CPU_COUNT If the default DOP is used.
  25. RECOVERY_PARALLELISM
  26. FAST_START_PARALLEL_ROLLBACK
  27. Profile SESSIONS_PER_USER also limits parallel servers.
  28. Resource Manager
  29. System load If parallel_adaptive_multi_user is true. Probably impossible to guess when Oracle will start throttling.
  30. PROCESSES
  31. Parallel DML restrictions Parallel DML will not work if any of these cases:
    1. COMPATIBLE < 9.2 for intra-partition
    2. INSERT VALUES, tables with triggers
    3. replication
    4. self-referential integrity or delete cascade or deferred integrity constraints
    5. accessing an object column
    6. non-partitioned table with LOB
    7. intra-partition parallelism with a LOB
    8. distributed transaction
    9. clustered tables
    10. temporary tables
  32. Scalar subqueries do not run in parallel? This is in the manual, and I wish this was true, but my tests indicate that parallelism works here in 11g.
  33. ENQUEUE_RESOURCES Hidden parameter in 10g, is this relevant any more?
  34. Index-organized tables Cannot direct-path insert to IOTs in parallel? (Is this still true?)
  35. Parallel pipelined function requirements Must use a CURSOR(?). TODO.
  36. Functions must be PARALLEL_ENABLE
  37. Type of statement Older versions restricted parallelism on DML depending on partitioning. Some of the current manuals still include this but it is certainly not true anymore.
  38. Number of partitions Only for partition-wise joins on older versions.(?)
  39. Bugs Specifically I've seen a lot of bugs with parsing. Oracle will allocate the right number of parallel servers but nothing will happen as they all wait for events like cursor: pin s wait on x.

This list is certainly not complete, and does not include 12c features. And it doesn't address operating system and hardware issues. And it doesn't answer the horribly difficult question, "what is the best degree of parallelism?" (Short answer: more is usually better, but at the expense of other processes.) Hopefully it at least gives you a sense of how difficult these problems can be, and a good place to start looking.

like image 101
Jon Heller Avatar answered Dec 06 '22 20:12

Jon Heller


So Oracle is using a star_transformation here, which is essentially a means of combining multiple bitmap indexes to give a set of rowid's to access on the main table. Using the rowid's to access the table is not a very parallel-amenable operation, in the way that a full table scan would be (or preferably a scan of a subset of partitions).

The star_transformation is being used because Oracle estimates that a relatively small number of rows will match all of those conditions -- it would be interesting to hear if the estimate of 15 million rows is correct or not. 15/335 is about 4.4% of the table rows, so on the face of it the decision to use an index-based method is appropriate unless the rows end up being completely randomly scattered among the blocks and you're accessing 30% of the blocks.

Anyway, I have a feeling that if Oracle is going to choose a star_transformation then the parallel hint becomes irrelevant.

The alternative to the star_transformation would be to use a full table scan on the large table, so I think it would be useful firstly to hint for a full table scan and parallelism. You might also issue an alter session command to disable star transformations temporarily as well, but the important part of hinting is to say exactly what access method you want before you dip into the parallelism.

Further Note: Incidentally, the temporary tables that you see in the plan are due to the star transformation -- when the dimension tables are first scanned to find the applicable rows, Oracle eliminates the need to find the useful subset again during the join-back phase of the query by storing them in a temporary table. You can disable this behaviour, but it's probably doing good here.

Another Note: Had another look at the explain plan -- those times of 00:27:44, 00:27:45 etc are cumulative. They include for example the 00:27:43 for TABLE ACCESS BY INDEX ROWID of /B28/EUDIQSBV. If you take that into consideration, you see that the hash joins take a few seconds each, and the performance hogs is the HASH GROUP BY at around 3 hours. It's using 4GB of temp space according to the estimate. That is probably a serious drain -- monitor the execution using V$SQL_WORKAREA to see how many passes it's doing, but PQ is an answer to that as well as long as you have the storage bandwidth.

like image 22
David Aldridge Avatar answered Dec 06 '22 20:12

David Aldridge