Edit: Solved! thanks to @kfinity.
AskTom suggests using select /*+ opt_param('_optimizer_use_feedback' 'false') */ at the start of your query to disable the feedback usage. This has fixed the problem for me.
tldr: Adding a randomized comment to a query makes it run consistent, removing this comment breaks it.
Warning: long.
My preferred way of working is with queries in the source as string so that they are in version control and I can see changes over time. Along with that I use dapper
and the oracle.ManagedDataAccess
NuGet package. The application in question is a WPF app (in both cases) running on .NET framework 4.7.2. I'm using Visual studio professional 2017 15.9.5.
About a year back I encountered this problem with a query. I don't remember which it was, I do know that I didn't have the time to document it and post it here. I do now and I ran into the same problem. Back then I somehow figured out that if I restarted my PC or changed the query text, it would run fine again. Just occasionally the symptoms of the problem would show up, I'd add a comment to the query, or remove the one previously there, I would test that specific function every release. I'd test it everytime because if it was faulty on my machine, it would also be faulty on the target user machine. At the time I figured it was a driver/hardware issue with my pc. The way I figured out I could fix it by changing the query text is because I would cut and paste (ctrl-x ctrl-v
) the entire query from the code to Oracle developer
and edit it there. At some point I noticed even an extra whitespace or enter would make it work again.
Back to now, I've got the problem again. This time it's different because it doesn't fail occasionally anymore. It's very consistent. Thinking back about how I figured this was a driver/hardware issue, I build and ran the application on 3 different machines, all the same results. I can run the query in Oracle developer
, using ctrl + end
to run the entire query, not just 50 lines. It runs in about 10 seconds. It runs consistently, over and over again, which makes sense if nothing changes.
If I run it from my application, it runs fine - but only once. It takes about 10 seconds as well. If I refresh the data, which runs the query again, it hangs. There are no exceptions, if I break the debugger it's just chilling on the database.Query<>()
call. If I either restart my PC or change the query, it will run - exactly once.
Of course, I went to google for help. Found some interesting articles that didn't really help me along:
https://mjsoracleblog.wordpress.com/2014/10/24/oracle-performance-mystery-wildly-varying-query-response-time/
Oracle inconsistent performance behaviour of query
Until I found this:
https://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1191435335912
They mention v$session_longops
which supposedly gives us insight into long running operations.
select *
from v$session_longops
where time_remaining > 0
I ran this while I had just refreshed the data in the application, causing the query to run for a second time. I see this:
The first query ran fine, indexes where fine. The 2nd time it started up a full table scan. It doesn't need this because it runs fine in the first time and in oracle developer
as well. As expected, if I leave it running (takes over 20 minutes) the table scan completes and I get the same result as the first time around.
I found that you can use explain plan for
to explain a query plan without using the GUI in Oracle developer
. This gave me 2 wildly different plans, the one in Oracle developer
always has this note: - 'PLAN_TABLE' is old version
. So I'm not sure that I can trust this information, I don't know what to do with it.
Plan from Oracle developer
Plan from code
Like I said before, adding or removing a comment or rather changing the query text fixes the problem and doesn't start a full table scan, ever. I added a comment containing DateTime.Now
to the query so that it is always different. It consistently works. While technically this fixes the problem I think it is quite a ridiculous fix to an even more ridiculous problem. I'd rather know why this happens and if maybe I'm doing something else wrong. So the question is, why does a randomized comment fix my query?
Some context: this is an ERP system, the query gets all the workorders that have a hierarchic structure, or that are just on themselves, combines them, then adds their required materials and some other information like their description. It only does this for workorders that are not closed yet.
SQL:
select
--Hierarchic info, some aliases exceeded 30 chars and had to be shorted to current state
hierarchic_workorders.ccn as HierarchicCcn ,
hierarchic_workorders.mas_loc as HierarchicMasLoc,
hierarchic_workorders.wo_num as HierarchicWoNum,
hierarchic_workorders.wo_line as HierarchicWoLine,
wo.item as HierarchicItem,
wo.revision as HierarchicRevision,
wo_item.description as HierarchicDescription,
wo_rtg.wc as HierarchicWorkCenter,
hierarchic_workorders.startdate as HierarchicStartDate,
hierarchic_workorders.mfgclosedate as HierarchicMfgClosedDate,
hierarchic_workorders.chassisnumbers as HierarchicChassisNumbers,
hierarchic_workorders.wo_level as HierarchicLevel,
hierarchic_workorders.parent_wo_num as HierarchicParentWoNum,
hierarchic_workorders.parent_wo_line as HierarchicParentWoLine,
hierarchic_workorders.parent_wo_bom_useq as HierarchicParentwobomuseq,
--wo bom info
wo_bom.ccn as WoRtgCcn,
wo_bom.mas_loc as WoRtgMasloc,
wo_bom.wo_num as WoRtgWoNum,
wo_bom.wo_line as WoRtgWoLine,
wo_bom.wo_bom_useq as WoRtgWobomUseq,
wo_bom.item as WoRtgItem,
wo_bom.revision as WoRtgRevision,
wo_bom_item.description as WoRtgDescription,
wo_bom.bom_comp_qty as WoRtgBomCompQty,
wo_bom.bom_commit as WoRtgCommit,
wo_bom.backflush as WoRtgBackflush,
wo_bom.issue_qty as WoRtgIssueQty,
wo_bom.commit_qty as WoRtgCommitQty,
wo_bom.reqd_qty as WoRtgReqdQty
from live.wo_bom
--===========================================================================================================================================================================
-- Maybe it's possible to remove this or the other min operation join in hierarchic_workorders, to make it faster - not sure if possible ====================================
--===========================================================================================================================================================================
left join(
select
wo_rtg_min_operation.min_operation,
wo_rtg.*
from live.wo_rtg
left join(
select
ccn,
mas_loc,
wo_num,
wo_line,
lpad(to_char(min(to_number(trim(operation)))), 4, ' ') as min_operation
from live.wo_rtg
group by ccn, mas_loc, wo_num, wo_line
)wo_rtg_min_operation
on wo_rtg_min_operation.ccn = wo_rtg.ccn
and wo_rtg_min_operation.mas_loc = wo_rtg.mas_loc
and wo_rtg_min_operation.wo_num = wo_rtg.wo_num
and wo_rtg_min_operation.wo_line = wo_rtg.wo_line
) wo_rtg
on wo_rtg.ccn = wo_bom.ccn
and wo_rtg.mas_loc = wo_bom.mas_loc
and wo_rtg.wo_num = wo_bom.wo_num
and wo_rtg.wo_line = wo_bom.wo_line
--This case when is painfully slow but it can't really be cached or indexed
and wo_rtg.operation = (
case when wo_bom.operation = ' ' then
wo_rtg.min_operation
else
wo_bom.operation
end
)
--===========================================================================================================================================================================
-- Find all open MPS orders and highest hierarchic PRP orders. Having these be a subquery instead of the starting data drastically increases performance ========================
--===========================================================================================================================================================================
join(
select
ccn,
mas_loc,
wo_num,
wo_line,
startdate,
mfgclosedate,
chassisnumbers,
wo_level,
parent_wo_num,
parent_wo_line,
parent_wo_bom_useq
from (
--===========================================================================================================================================================================
-- PRP ======================================================================================================================================================================
--===========================================================================================================================================================================
select
'PRP' as type,
wowob.ccn,
wowob.mas_loc,
wowob.wo_num,
wowob.wo_line,
apssplit_min_operation.operation_start_date as startdate,
wo.mfg_close_date as mfgclosedate,
trim(
trim(wo.user_alpha2) || ' ' ||
trim(wo.user_alpha3) || ' ' ||
trim(wo.chassis3) || ' ' ||
trim(wo.chassis4) || ' ' ||
trim(wo.chassis5)
) as chassisnumbers,
level as wo_level,
wowob.parent_wo_num,
wowob.parent_wo_line,
wowob.parent_wo_bom_useq
from live.wowob
join live.wo
on wo.ccn = wowob.ccn
and wo.mas_loc = wowob.mas_loc
and wo.wo_num = wowob.wo_num
and wo.wo_line = wowob.wo_line
left join(
select
ccn,
mas_loc,
orderident,
order_line,
lpad(to_char(min(to_number(trim(operation)))), 4, ' ') as min_operation,
operation_start_date
from live.apssplit
where schedule = 'SHOP' and order_type = 'W'
group by ccn, mas_loc, orderident, order_line, operation_start_date
) apssplit_min_operation
on apssplit_min_operation.ccn = wowob.ccn
and apssplit_min_operation.mas_loc = wowob.mas_loc
and apssplit_min_operation.orderident = wowob.wo_num
and apssplit_min_operation.order_line = wowob.wo_line
--Only select open wo's
--Underlying wo's obviously have to start BEFORE their parents, we don't have to check them all for this reason.
where apssplit_min_operation.operation_start_date is not null
and apssplit_min_operation.operation_start_date < sysdate + :days_ahead
--wo.mfg_close_date is null and
--wo.fin_close_date is null and
--wo.ord_qty - wo.scrap_qty - wo.complete_qty > 0
--and wo.start_date < sysdate + :days_ahead
--and wowob.wo_num = ' 334594'
--Grab the childs of only the highest parents.
connect by prior wowob.ccn = wowob.ccn
and prior wowob.mas_loc = wowob.mas_loc
and prior wowob.wo_num = wowob.parent_wo_num
and prior wowob.wo_line = wowob.parent_wo_line
start with wowob.ccn || wowob.mas_loc || wowob.wo_num || wowob.wo_line in (
--Subquery to select all the highest hierarchic wowob's that are still open in wo.
--Performance:
--all: 21253 in ?
--Open only: 174 in 0.155 seconds
select
wowob.ccn || wowob.mas_loc || wowob.wo_num || wowob.wo_line as wowob_key
from live.wowob
--Parent join
left join live.wowob parent_wowob
on wowob.ccn = parent_wowob.ccn
and wowob.mas_loc = parent_wowob.mas_loc
and wowob.parent_wo_num = parent_wowob.wo_num
and wowob.parent_wo_line = parent_wowob.wo_line
--end parent join
where wowob.ccn = :ccn
and wowob.mas_loc = :mas_loc
and parent_wowob.ccn is null
)
union all
--===========================================================================================================================================================================
-- MPS ======================================================================================================================================================================
--===========================================================================================================================================================================
select
'MPS' as type,
wo.ccn,
wo.mas_loc,
wo.wo_num,
wo.wo_line,
apssplit_min_operation.operation_start_date as startdate,
wo.mfg_close_date as mfgclosedate,
trim(
trim(wo.user_alpha2) || ' ' ||
trim(wo.user_alpha3) || ' ' ||
trim(wo.chassis3) || ' ' ||
trim(wo.chassis4) || ' ' ||
trim(wo.chassis5)
) as chassisnumbers,
1 as wo_level,
'' as parent_wo_num,
'' as parent_wo_line,
'' as parent_wo_bom_useq
from live.wo
join live.item_ccn
on item_ccn.ccn = wo.ccn
and item_ccn.item = wo.item
and item_ccn.revision = wo.revision
and item_ccn.mastsched = 'Y' --mps
and item_ccn.planned = ' ' --mrp
and item_ccn.prp = ' ' --NOT prp...
left join(
select
ccn,
mas_loc,
orderident,
order_line,
lpad(to_char(min(to_number(trim(operation)))), 4, ' ') as min_operation,
operation_start_date
from live.apssplit
where schedule = 'SHOP' and order_type = 'W'
group by ccn, mas_loc, orderident, order_line, operation_start_date
) apssplit_min_operation
on apssplit_min_operation.ccn = wo.ccn
and apssplit_min_operation.mas_loc = wo.mas_loc
and apssplit_min_operation.orderident = wo.wo_num
and apssplit_min_operation.order_line = wo.wo_line
--Only select open wo's
--Underlying wo's obviously have to start BEFORE their parents, we don't have to check them all for this reason.
where apssplit_min_operation.operation_start_date is not null
and apssplit_min_operation.operation_start_date < sysdate + :days_ahead
)
order by startdate
) hierarchic_workorders
on hierarchic_workorders.ccn = wo_bom.ccn
and hierarchic_workorders.mas_loc = wo_bom.mas_loc
and hierarchic_workorders.wo_num = wo_bom.wo_num
and hierarchic_workorders.wo_line = wo_bom.wo_line
--===========================================================================================================================================================================
-- Descriptions from wo. wowob and wo_bom are different items and they have different descriptions. =========================================================================
--===========================================================================================================================================================================
left join live.wo
on wo.ccn = hierarchic_workorders.ccn
and wo.mas_loc = hierarchic_workorders.mas_loc
and wo.wo_num = hierarchic_workorders.wo_num
and wo.wo_line = hierarchic_workorders.wo_line
left join live.item wo_item
on wo_item.item = wo.item
and wo_item.revision = wo.revision
left join live.item wo_bom_item
on wo_bom_item.item = wo_bom.item
and wo_bom_item.revision = wo_bom.revision
C# (does NOT work):
using (IDbConnection database = new OracleConnection(_applicationSettings.OracleConnectionString))
{
DynamicParameters parameters = new DynamicParameters();
parameters.Add("ccn", ccn);
parameters.Add("mas_loc", masLoc);
parameters.Add("days_ahead", daysAhead);
return database.Query<HierarchicWoWoBom>(Query, parameters).ToList();
}
C# (does work consistently):
using (IDbConnection database = new OracleConnection(_applicationSettings.OracleConnectionString))
{
DynamicParameters parameters = new DynamicParameters();
parameters.Add("ccn", ccn);
parameters.Add("mas_loc", masLoc);
parameters.Add("days_ahead", daysAhead);
return database.Query<HierarchicWoWoBom>($"-- {DateTime.Now}: randomized comment so that this query keeps working.\n"
+ Query, parameters).ToList();
}
I'm not sure this is really an answer, but it's too long for a comment.
I think a fast first query followed by a slow second query often indicates a statistics/cardinality feedback issue.
Basically while running the query the first time, the optimizer might detect that the estimated cardinality (number of rows) from the current table/index statistics are pretty inaccurate, so it tries to cache more accurate stats for the next run of the same query. But sometimes that actually makes things much worse.
As a quick fix, AskTom suggests that you can try disabling that feature with the /*+ opt_param('_optimizer_use_feedback' 'false') */
hint, or use SQL Plan Management to save the good plan, as Ted mentioned above.
In the longer term, I think it might indicate that some of your statistics might be stale? You can narrow down the problem stats by doing cardinality tuning and looking for where in the plan the actual rows are a lot higher than the expected rows. The basic process is to run your query with the /*+ GATHER_PLAN_STATISTICS */
hint and then do SELECT * FROM table(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ALLSTATS LAST'));
to see the results.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With