This query consists of 16 equal steps.
Every step is doing the same calculation on the same dataset (a single row),
but last steps take too much time for it.
with t0 as (select 0 as k from dual)
,t1 as (select k from t0 where k >= (select avg(k) from t0))
,t2 as (select k from t1 where k >= (select avg(k) from t1))
,t3 as (select k from t2 where k >= (select avg(k) from t2))
,t4 as (select k from t3 where k >= (select avg(k) from t3))
,t5 as (select k from t4 where k >= (select avg(k) from t4))
,t6 as (select k from t5 where k >= (select avg(k) from t5))
,t7 as (select k from t6 where k >= (select avg(k) from t6))
,t8 as (select k from t7 where k >= (select avg(k) from t7))
,t9 as (select k from t8 where k >= (select avg(k) from t8))
,t10 as (select k from t9 where k >= (select avg(k) from t9))
,t11 as (select k from t10 where k >= (select avg(k) from t10))
,t12 as (select k from t11 where k >= (select avg(k) from t11)) -- 0.5 sec
,t13 as (select k from t12 where k >= (select avg(k) from t12)) -- 1.3 sec
,t14 as (select k from t13 where k >= (select avg(k) from t13)) -- 4.5 sec
,t15 as (select k from t14 where k >= (select avg(k) from t14)) -- 30 sec
,t16 as (select k from t15 where k >= (select avg(k) from t15)) -- 4 min
select k from t16
Subquery t10 completes immediately, but the entire query (t16) requires 4 minutes to complete.
Q1.
Why calculation times of identical subqueries on the same data are very different?
Q2.
It looks like a bug as it runs very fast on Oracle 9 and very slow on Oracle 11.
In fact, every select statement with long and complex with-clause would behave the same way.
Is it a known bug? (I don't have access to metalink)
What workaround is recommended?
Q3.
I must write code for Oracle 11 and I must do all my calculations in single select statement.
I can't split my long statement in two separate statements to speed it up.
Does there exist a hint in Oracle (or maybe some trick) to make the whole query (t16) be completed in a reasonable time (e.g., within one second)? I tried to find such one but to no avail.
BTW, execution plan is excellent, and cost behaves as a linear function (not exponential) of number of steps.
CTE can be more readable: Another advantage of CTE is CTE is more readable than Subqueries. Since CTE can be reusable, you can write less code using CTE than using a subquery. Also, people tend to follow logic and ideas easier in sequence than in a nested fashion.
cte4 AND CTE5, that takes foreover to run and doesnt even stop, only 200 recs are in the CTE. The LEFT JOIN in the end on 2 ctes i.e cte4 and cte5 is causing the slowness.
For most cases using CTEs or subquery or derived tables does not make a huge performance impact.
A CTE (common table expression) is a named subquery defined in a WITH clause. You can think of the CTE as a temporary view for use in the statement that defines the CTE. The CTE defines the temporary view's name, an optional list of column names, and a query expression (i.e. a SELECT statement).
Q1: Seems that there are nothing about calculation time, just bug in optimizer algorithm which make it mad while calculating a best execution plan.
Q2: There are a number of known and fixed bugs in Oracle 11.X.0.X related to optimization of nested queries and query factoring. But it's very hard to find a concrete issue.
Q3: There are two undocumented hints: materialize
and inline
but no one of them works for me while I tried your example. It's possible that some changes in server configuration or upgrading to 11.2.0.3 may increase limit of nested with
clauses: for me (on 11.2.0.3 Win7/x86) your example works fine, but increasing number of nested tables to 30 hangs a session.
Workaround may look like this:
select k from (
select k, avg(k) over (partition by null) k_avg from ( --t16
select k, avg(k) over (partition by null) k_avg from ( --t15
select k, avg(k) over (partition by null) k_avg from ( --t14
select k, avg(k) over (partition by null) k_avg from ( --t13
select k, avg(k) over (partition by null) k_avg from ( --t12
select k, avg(k) over (partition by null) k_avg from ( --t11
select k, avg(k) over (partition by null) k_avg from ( --t10
select k, avg(k) over (partition by null) k_avg from ( --t9
select k, avg(k) over (partition by null) k_avg from ( --t8
select k, avg(k) over (partition by null) k_avg from ( --t7
select k, avg(k) over (partition by null) k_avg from ( --t6
select k, avg(k) over (partition by null) k_avg from ( --t5
select k, avg(k) over (partition by null) k_avg from ( --t4
select k, avg(k) over (partition by null) k_avg from ( --t3
select k, avg(k) over (partition by null) k_avg from ( --t2
select k, avg(k) over (partition by null) k_avg from ( -- t1
select k, avg(k) over (partition by null) k_avg from (select 0 as k from dual) t0
) where k >= k_avg
) where k >= k_avg
) where k >= k_avg
) where k >= k_avg
) where k >= k_avg
) where k >= k_avg
) where k >= k_avg
) where k >= k_avg
) where k >= k_avg
) where k >= k_avg
) where k >= k_avg
) where k >= k_avg
) where k >= k_avg
) where k >= k_avg
) where k >= k_avg
) where k >= k_avg
)
At least it works for me on nesting level of 30 and produces totally different execution plan with WINDOW BUFFER
and VIEW
instead of LOAD TABLE AS SELECT
, SORT AGGREGATE
and TABLE ACCESS FULL
.
Update
Just installed 11.2.0.4 (Win7/32bit) and test it against initial query. Nothing changed in optimizer behavior.
There are no possibility to directly affect a CBO behavior, even with use of inline
(undocumented) or RULE
(deprecated) hints. May be some Guru knows a some variant, but it's a Top Secret for me (and Google too :-) .
Doing things in a one select statement in reasonable time is possible if a main select statement separated into a parts and placed into the function which returns a set of rows (function returning sys_refcursor or strong typed cursor), but it's not a choice if a query constructed at runtime.
Workaround with usage of XML is possible, but this variant looks like removing a tonsil through the ass hole (sorry):
.
select
extractvalue(column_value,'/t/somevalue') abc
from
table(xmlsequence((
select t2 from (
select
t0,
t1,
(
select xmlagg(
xmlelement("t",
xmlelement("k1",extractvalue(t1t.column_value,'/t/k1')),
xmlelement("somevalue", systimestamp))
)
from
table(xmlsequence(t0)) t0t,
table(xmlsequence(t1)) t1t
where
extractvalue(t1t.column_value,'/t/k1') >= (
select avg(extractvalue(t1t.column_value, '/t/k1')) from table(xmlsequence(t1))
)
and
extractvalue(t0t.column_value,'/t/k2') > 6
) t2
from (
select
t0,
(
select xmlagg(
xmlelement("t",
xmlelement("k1",extractvalue(column_value,'/t/k1')),
xmlelement("somevalue", sysdate))
)
from table(xmlsequence(t0))
where
extractvalue(column_value,'/t/k1') >= (
select avg(extractvalue(column_value, '/t/k1')) from table(xmlsequence(t0))
)
) t1
from (
select
xmlagg(xmlelement("t", xmlelement("k1", level), xmlelement("k2", level + 3))) t0
from dual connect by level < 5
)
)
)
)))
Another thing about a strange code above is that this variant applicable only if with
data sets didn't have a big number of rows.
(This is isn't a full answer. Hopefully the information here will help someone else produce a better answer.)
Q1: The optimizer rewrites the query by inlining everything. The internal statement doubles in size with each new common table expression and the statements quickly grow ginormous. For example, T15 produces a 3,162,172 character query.
Code to trace the statements:
sqlplus user/pass@orcl
alter session set events '10053 trace name context forever, level 1';
with t0 as (select 0 as k from dual)
,t1 as (select k from t0 where k >= (select avg(k) from t0))
,t2 as (select k from t1 where k >= (select avg(k) from t1))
select k from t2;
exit;
sqlplus user/pass@orcl
alter session set events '10053 trace name context forever, level 1';
with t0 as (select 0 as k from dual)
,t1 as (select k from t0 where k >= (select avg(k) from t0))
,t2 as (select k from t1 where k >= (select avg(k) from t1))
,t3 as (select k from t2 where k >= (select avg(k) from t2))
select k from t3;
exit;
If you compare the two trace files there are many differences, but most of them look very minor. The real difference is in only a single line that comes right after the string: Stmt: ******* UNPARSED QUERY IS *******
. Be careful opening the trace files if you trace larger queries. Not all editors can handle such huge lines. And the T20 file was 250MB!
SQL from first trace, after formatting:
SELECT "T1"."K" "K"
FROM (SELECT 0 "K"
FROM "SYS"."DUAL" "DUAL"
WHERE 0 >= (SELECT AVG(0) "AVG(K)" FROM "SYS"."DUAL" "DUAL")) "T1"
WHERE "T1"."K" >=
(SELECT AVG("T1"."K") "AVG(K)"
FROM (SELECT 0 "K"
FROM "SYS"."DUAL" "DUAL"
WHERE 0 >= (SELECT AVG(0) "AVG(K)" FROM "SYS"."DUAL" "DUAL")) "T1")
SQL from second trace, after formatting:
SELECT "T2"."K" "K"
FROM (SELECT "T1"."K" "K"
FROM (SELECT 0 "K"
FROM "SYS"."DUAL" "DUAL"
WHERE 0 >= (SELECT AVG(0) "AVG(K)" FROM "SYS"."DUAL" "DUAL")) "T1"
WHERE "T1"."K" >=
(SELECT AVG("T1"."K") "AVG(K)"
FROM (SELECT 0 "K"
FROM "SYS"."DUAL" "DUAL"
WHERE 0 >=
(SELECT AVG(0) "AVG(K)" FROM "SYS"."DUAL" "DUAL")) "T1")) "T2"
WHERE "T2"."K" >=
(SELECT AVG("T2"."K") "AVG(K)"
FROM (SELECT "T1"."K" "K"
FROM (SELECT 0 "K"
FROM "SYS"."DUAL" "DUAL"
WHERE 0 >=
(SELECT AVG(0) "AVG(K)" FROM "SYS"."DUAL" "DUAL")) "T1"
WHERE "T1"."K" >=
(SELECT AVG("T1"."K") "AVG(K)"
FROM (SELECT 0 "K"
FROM "SYS"."DUAL" "DUAL"
WHERE 0 >= (SELECT AVG(0) "AVG(K)"
FROM "SYS"."DUAL" "DUAL")) "T1")) "T2")
Q2: I wouldn't say every "complex" common table expression would behave the same way. I've seen CTEs that were much larger. It's only the extreme nesting that seems to be the problem. I couldn't find any obvious bugs on Oracle Support.
ThinkJet's code looks like a good work around. Nesting inline views is more common than nesting common table expressions.
Q3: There probably is a hint to prevent this behavior, but I'm not sure what it is. Hopefully by showing the transformed version of the query someone else can guess how to fix it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With