Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Oracle query only runs consistently when adding a random comment

Tags:

c#

sql

oracle

Edit: Solved! thanks to @kfinity.

AskTom suggests using select /*+ opt_param('_optimizer_use_feedback' 'false') */ at the start of your query to disable the feedback usage. This has fixed the problem for me.

tldr: Adding a randomized comment to a query makes it run consistent, removing this comment breaks it.

Warning: long.

Environment

My preferred way of working is with queries in the source as string so that they are in version control and I can see changes over time. Along with that I use dapper and the oracle.ManagedDataAccess NuGet package. The application in question is a WPF app (in both cases) running on .NET framework 4.7.2. I'm using Visual studio professional 2017 15.9.5.

The problem

About a year back I encountered this problem with a query. I don't remember which it was, I do know that I didn't have the time to document it and post it here. I do now and I ran into the same problem. Back then I somehow figured out that if I restarted my PC or changed the query text, it would run fine again. Just occasionally the symptoms of the problem would show up, I'd add a comment to the query, or remove the one previously there, I would test that specific function every release. I'd test it everytime because if it was faulty on my machine, it would also be faulty on the target user machine. At the time I figured it was a driver/hardware issue with my pc. The way I figured out I could fix it by changing the query text is because I would cut and paste (ctrl-x ctrl-v) the entire query from the code to Oracle developer and edit it there. At some point I noticed even an extra whitespace or enter would make it work again.

Back to now, I've got the problem again. This time it's different because it doesn't fail occasionally anymore. It's very consistent. Thinking back about how I figured this was a driver/hardware issue, I build and ran the application on 3 different machines, all the same results. I can run the query in Oracle developer, using ctrl + end to run the entire query, not just 50 lines. It runs in about 10 seconds. It runs consistently, over and over again, which makes sense if nothing changes.

If I run it from my application, it runs fine - but only once. It takes about 10 seconds as well. If I refresh the data, which runs the query again, it hangs. There are no exceptions, if I break the debugger it's just chilling on the database.Query<>() call. If I either restart my PC or change the query, it will run - exactly once.

v$session_longops/full table scan

Of course, I went to google for help. Found some interesting articles that didn't really help me along:

https://mjsoracleblog.wordpress.com/2014/10/24/oracle-performance-mystery-wildly-varying-query-response-time/

Oracle inconsistent performance behaviour of query

Until I found this:

https://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1191435335912

They mention v$session_longops which supposedly gives us insight into long running operations.

select *
from v$session_longops
where time_remaining > 0

I ran this while I had just refreshed the data in the application, causing the query to run for a second time. I see this:

query results

The first query ran fine, indexes where fine. The 2nd time it started up a full table scan. It doesn't need this because it runs fine in the first time and in oracle developer as well. As expected, if I leave it running (takes over 20 minutes) the table scan completes and I get the same result as the first time around.

I found that you can use explain plan for to explain a query plan without using the GUI in Oracle developer. This gave me 2 wildly different plans, the one in Oracle developer always has this note: - 'PLAN_TABLE' is old version. So I'm not sure that I can trust this information, I don't know what to do with it.

Plan from Oracle developer

Plan from code

The fix

Like I said before, adding or removing a comment or rather changing the query text fixes the problem and doesn't start a full table scan, ever. I added a comment containing DateTime.Now to the query so that it is always different. It consistently works. While technically this fixes the problem I think it is quite a ridiculous fix to an even more ridiculous problem. I'd rather know why this happens and if maybe I'm doing something else wrong. So the question is, why does a randomized comment fix my query?

The code

Some context: this is an ERP system, the query gets all the workorders that have a hierarchic structure, or that are just on themselves, combines them, then adds their required materials and some other information like their description. It only does this for workorders that are not closed yet.

SQL:

select
    --Hierarchic info, some aliases exceeded 30 chars and had to be shorted to current state
    hierarchic_workorders.ccn                   as HierarchicCcn ,
    hierarchic_workorders.mas_loc               as HierarchicMasLoc,
    hierarchic_workorders.wo_num                as HierarchicWoNum,
    hierarchic_workorders.wo_line               as HierarchicWoLine,
    wo.item                                     as HierarchicItem,
    wo.revision                                 as HierarchicRevision,    
    wo_item.description                         as HierarchicDescription,
    wo_rtg.wc                                   as HierarchicWorkCenter,
    hierarchic_workorders.startdate             as HierarchicStartDate,
    hierarchic_workorders.mfgclosedate          as HierarchicMfgClosedDate,
    hierarchic_workorders.chassisnumbers        as HierarchicChassisNumbers,
    hierarchic_workorders.wo_level              as HierarchicLevel,
    hierarchic_workorders.parent_wo_num         as HierarchicParentWoNum,
    hierarchic_workorders.parent_wo_line        as HierarchicParentWoLine,
    hierarchic_workorders.parent_wo_bom_useq    as HierarchicParentwobomuseq,

    --wo bom info
    wo_bom.ccn                  as WoRtgCcn,
    wo_bom.mas_loc              as WoRtgMasloc,
    wo_bom.wo_num               as WoRtgWoNum,
    wo_bom.wo_line              as WoRtgWoLine,
    wo_bom.wo_bom_useq          as WoRtgWobomUseq,
    wo_bom.item                 as WoRtgItem,
    wo_bom.revision             as WoRtgRevision,
    wo_bom_item.description     as WoRtgDescription,
    wo_bom.bom_comp_qty         as WoRtgBomCompQty,
    wo_bom.bom_commit           as WoRtgCommit,
    wo_bom.backflush            as WoRtgBackflush,
    wo_bom.issue_qty            as WoRtgIssueQty,
    wo_bom.commit_qty           as WoRtgCommitQty,
    wo_bom.reqd_qty             as WoRtgReqdQty

from live.wo_bom


--===========================================================================================================================================================================
-- Maybe it's possible to remove this or the other min operation join in hierarchic_workorders, to make it faster - not sure if possible ====================================
--===========================================================================================================================================================================
left join(
    select
        wo_rtg_min_operation.min_operation,
        wo_rtg.*
    from live.wo_rtg    
    left join(
        select
            ccn,
            mas_loc,
            wo_num,
            wo_line,
            lpad(to_char(min(to_number(trim(operation)))), 4, ' ')  as min_operation
        from live.wo_rtg
        group by ccn, mas_loc, wo_num, wo_line
    )wo_rtg_min_operation
        on  wo_rtg_min_operation.ccn     = wo_rtg.ccn
        and wo_rtg_min_operation.mas_loc = wo_rtg.mas_loc
        and wo_rtg_min_operation.wo_num  = wo_rtg.wo_num
        and wo_rtg_min_operation.wo_line = wo_rtg.wo_line
) wo_rtg
    on wo_rtg.ccn = wo_bom.ccn
    and wo_rtg.mas_loc = wo_bom.mas_loc
    and wo_rtg.wo_num = wo_bom.wo_num
    and wo_rtg.wo_line = wo_bom.wo_line
    --This case when is painfully slow but it can't really be cached or indexed
    and wo_rtg.operation = (
            case when wo_bom.operation = ' ' then
                wo_rtg.min_operation
            else 
               wo_bom.operation
            end
        )
--===========================================================================================================================================================================
-- Find all open MPS orders and highest hierarchic PRP orders. Having these be a subquery instead of the starting data drastically increases performance ========================
--===========================================================================================================================================================================
join(

    select
        ccn,
        mas_loc,
        wo_num,
        wo_line,
        startdate,
        mfgclosedate,
        chassisnumbers,
        wo_level,
        parent_wo_num,
        parent_wo_line,
        parent_wo_bom_useq
    from (
        --===========================================================================================================================================================================
        -- PRP ======================================================================================================================================================================
        --===========================================================================================================================================================================

        select
            'PRP' as type,

            wowob.ccn,
            wowob.mas_loc,
            wowob.wo_num,
            wowob.wo_line,

            apssplit_min_operation.operation_start_date as startdate,
            wo.mfg_close_date as mfgclosedate,
            trim(
                trim(wo.user_alpha2) || ' ' ||
                trim(wo.user_alpha3) || ' ' ||
                trim(wo.chassis3)    || ' ' ||
                trim(wo.chassis4)    || ' ' ||
                trim(wo.chassis5)
            ) as chassisnumbers,        
            level as wo_level,
            wowob.parent_wo_num,
            wowob.parent_wo_line,
            wowob.parent_wo_bom_useq
        from live.wowob

        join live.wo
            on wo.ccn = wowob.ccn
            and wo.mas_loc = wowob.mas_loc
            and wo.wo_num = wowob.wo_num
            and wo.wo_line = wowob.wo_line
        left join(
             select
                ccn,
                mas_loc,
                orderident,
                order_line,
                lpad(to_char(min(to_number(trim(operation)))), 4, ' ')  as min_operation,
                operation_start_date
            from live.apssplit
            where schedule = 'SHOP' and order_type = 'W'
            group by ccn, mas_loc, orderident, order_line, operation_start_date
        ) apssplit_min_operation
            on  apssplit_min_operation.ccn = wowob.ccn
            and apssplit_min_operation.mas_loc = wowob.mas_loc
            and apssplit_min_operation.orderident = wowob.wo_num
            and apssplit_min_operation.order_line = wowob.wo_line

        --Only select open wo's
        --Underlying wo's obviously have to start BEFORE their parents, we don't have to check them all for this reason.
        where apssplit_min_operation.operation_start_date is not null
        and   apssplit_min_operation.operation_start_date < sysdate + :days_ahead
            --wo.mfg_close_date is null and 
            --wo.fin_close_date is null and
            --wo.ord_qty - wo.scrap_qty - wo.complete_qty > 0
            --and wo.start_date < sysdate + :days_ahead    
            --and wowob.wo_num = '              334594'        

        --Grab the childs of only the highest parents.
        connect by prior wowob.ccn      = wowob.ccn
               and prior wowob.mas_loc  = wowob.mas_loc
               and prior wowob.wo_num   = wowob.parent_wo_num
               and prior wowob.wo_line  = wowob.parent_wo_line
        start with wowob.ccn || wowob.mas_loc || wowob.wo_num || wowob.wo_line in (
            --Subquery to select all the highest hierarchic wowob's that are still open in wo.
            --Performance:
                --all: 21253 in ?
                --Open only: 174 in 0.155 seconds
            select
                 wowob.ccn || wowob.mas_loc || wowob.wo_num || wowob.wo_line as wowob_key
            from live.wowob

            --Parent join
            left join live.wowob parent_wowob
                on wowob.ccn               = parent_wowob.ccn
                and wowob.mas_loc           = parent_wowob.mas_loc
                and wowob.parent_wo_num     = parent_wowob.wo_num
                and wowob.parent_wo_line    = parent_wowob.wo_line
            --end parent join

            where   wowob.ccn =       :ccn
                and wowob.mas_loc = :mas_loc
                and parent_wowob.ccn is null
        )

        union all


        --===========================================================================================================================================================================
        -- MPS ======================================================================================================================================================================
        --===========================================================================================================================================================================

        select 
            'MPS' as type,
            wo.ccn,
            wo.mas_loc,
            wo.wo_num,
            wo.wo_line,

            apssplit_min_operation.operation_start_date as startdate,
            wo.mfg_close_date as mfgclosedate,
            trim(
                trim(wo.user_alpha2) || ' ' ||
                trim(wo.user_alpha3) || ' ' ||
                trim(wo.chassis3)    || ' ' ||
                trim(wo.chassis4)    || ' ' ||
                trim(wo.chassis5)
            ) as chassisnumbers,    
            1 as wo_level,
            '' as parent_wo_num,
            '' as parent_wo_line,
            '' as parent_wo_bom_useq    
        from live.wo
        join live.item_ccn
            on  item_ccn.ccn        = wo.ccn
            and item_ccn.item       = wo.item
            and item_ccn.revision   = wo.revision
            and item_ccn.mastsched = 'Y' --mps
            and item_ccn.planned = ' '   --mrp
            and item_ccn.prp = ' '       --NOT prp...

        left join(
             select
                ccn,
                mas_loc,
                orderident,
                order_line,
                lpad(to_char(min(to_number(trim(operation)))), 4, ' ')  as min_operation,
                operation_start_date
            from live.apssplit
            where schedule = 'SHOP' and order_type = 'W'
            group by ccn, mas_loc, orderident, order_line, operation_start_date
        ) apssplit_min_operation
            on  apssplit_min_operation.ccn = wo.ccn
            and apssplit_min_operation.mas_loc = wo.mas_loc
            and apssplit_min_operation.orderident = wo.wo_num
            and apssplit_min_operation.order_line = wo.wo_line

        --Only select open wo's
        --Underlying wo's obviously have to start BEFORE their parents, we don't have to check them all for this reason.
        where apssplit_min_operation.operation_start_date is not null
        and   apssplit_min_operation.operation_start_date < sysdate + :days_ahead
    )
    order by startdate

) hierarchic_workorders
    on  hierarchic_workorders.ccn       = wo_bom.ccn
    and hierarchic_workorders.mas_loc   = wo_bom.mas_loc
    and hierarchic_workorders.wo_num    = wo_bom.wo_num
    and hierarchic_workorders.wo_line   = wo_bom.wo_line


--===========================================================================================================================================================================
-- Descriptions from wo. wowob and wo_bom are different items and they have different descriptions. =========================================================================
--===========================================================================================================================================================================
left join live.wo
    on  wo.ccn      = hierarchic_workorders.ccn
    and wo.mas_loc  = hierarchic_workorders.mas_loc
    and wo.wo_num   = hierarchic_workorders.wo_num
    and wo.wo_line  = hierarchic_workorders.wo_line

left join live.item wo_item
    on  wo_item.item     = wo.item
    and wo_item.revision = wo.revision

left join live.item wo_bom_item
    on  wo_bom_item.item     = wo_bom.item
    and wo_bom_item.revision = wo_bom.revision

C# (does NOT work):

using (IDbConnection database = new OracleConnection(_applicationSettings.OracleConnectionString))
{
    DynamicParameters parameters = new DynamicParameters();
    parameters.Add("ccn", ccn);
    parameters.Add("mas_loc", masLoc);
    parameters.Add("days_ahead", daysAhead);

    return database.Query<HierarchicWoWoBom>(Query, parameters).ToList();
}

C# (does work consistently):

using (IDbConnection database = new OracleConnection(_applicationSettings.OracleConnectionString))
{
    DynamicParameters parameters = new DynamicParameters();
    parameters.Add("ccn", ccn);
    parameters.Add("mas_loc", masLoc);
    parameters.Add("days_ahead", daysAhead);

    return database.Query<HierarchicWoWoBom>($"-- {DateTime.Now}: randomized comment so that this query keeps working.\n" 
                                      + Query, parameters).ToList();
}
like image 334
o10957295 Avatar asked Nov 06 '22 21:11

o10957295


1 Answers

I'm not sure this is really an answer, but it's too long for a comment.

I think a fast first query followed by a slow second query often indicates a statistics/cardinality feedback issue.

Basically while running the query the first time, the optimizer might detect that the estimated cardinality (number of rows) from the current table/index statistics are pretty inaccurate, so it tries to cache more accurate stats for the next run of the same query. But sometimes that actually makes things much worse.

As a quick fix, AskTom suggests that you can try disabling that feature with the /*+ opt_param('_optimizer_use_feedback' 'false') */ hint, or use SQL Plan Management to save the good plan, as Ted mentioned above.

In the longer term, I think it might indicate that some of your statistics might be stale? You can narrow down the problem stats by doing cardinality tuning and looking for where in the plan the actual rows are a lot higher than the expected rows. The basic process is to run your query with the /*+ GATHER_PLAN_STATISTICS */ hint and then do SELECT * FROM table(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ALLSTATS LAST')); to see the results.

like image 191
kfinity Avatar answered Nov 14 '22 21:11

kfinity