We have a specific query that runs a lot slower when inside a proc. I have to add here that it is enclosed within a two level-cursor. However, both cursors have an iteration result-set of one line.
Let me first state things that we tried and failed:
Here is the query as taken from inside the proc/cursors.
select @tpdim1 = dim1, @tpdim2 = dim2, @typecalc = typecalc
from loyalty_policy where code=@loop2_loyalty_policy
Note: @loop2_loyalty_policy is the var taken from the result of the inner cursor, and has one value. code
is PK to the loyalty_policy
table. Thus, @tpdim1 and @tpdim2 have a single value each.
SET STATISTICS PROFILE ON
SET STATISTICS xml on
insert into @tbl_loyal_loop2 (cnt, store, map, pda, insdate, line, item, loyalty_policy_data, loyal_calc, loyalty_policy)
select @cnt, t.store, t.map, t.pda, t.insdate, t.line, t.item, ld.tab_id,
case @typecalc
when 1 then convert(bigint,round(isnull(t.valueFromTran,0.00) * ld.value , 0 ) )
when 2 then convert(bigint,round(isnull(t.netvalue,0.00) * ld.value , 0 ) )
when 3 then convert(bigint,isnull(t.qty,0) * ld.value )
when 4 then convert(bigint,round(isnull(t.valueFromPrice2,0.00) * ld.value , 0 ) )
when 5 then convert(bigint,round(isnull(t.valueFromPrice3,0.00) * ld.value , 0 ) )
when 6 then convert(bigint,round(isnull(t.valueFromPrice4,0.00) * ld.value , 0 ) )
else 0 end
,@loop2_loyalty_policy
from loyalty_policy_data ld-- with (index=ind_loyalty_policy_02)
inner join #tbl_data t on t.insdate >= ld.fdateactive and t.insdate <= ld.tdateactive
where ld.loyalty_policy = @loop2_loyalty_policy
and ld.tdateactive >= @from_rundate and ld.fdateactive <= @to_rundate
and t.dbupddate > @loop1_dbupddate
and
case when @tpdim1 is null then ''
else
case @tpdim1
when 'STORE' then t.store when 'BRAND' then t.brand when 'CAT1' then t.cat1 when 'CAT2' then t.cat2 when 'CAT3' then t.cat3 when 'ITEM' then t.item
when 'CUSTGROUP' then t.custgroup when 'CUSTGROUP2' then t.custgroup2 when 'CUSTGROUP3' then t.custgroup3
when 'CUSTOMER' then @customer
else '' end
end
= case when @tpdim1 is null then '' else ld.dim1 end
and
case when @tpdim2 is null then ''
else
case @tpdim2
when 'STORE' then t.store when 'BRAND' then t.brand when 'CAT1' then t.cat1 when 'CAT2' then t.cat2 when 'CAT3' then t.cat3 when 'ITEM' then t.item
when 'CUSTGROUP' then t.custgroup when 'CUSTGROUP2' then t.custgroup2 when 'CUSTGROUP3' then t.custgroup3
when 'CUSTOMER' then @customer
else '' end
end
= case when @tpdim2 is null then '' else ld.dim2 end
SET STATISTICS xml off
The SET STATISTICS XML
for the above returns this plan.
In trying to debug it, we isolated the query in the following form (here, you can also see how table #a is made, which has exactly the same data with the previous #tbl_data):
drop table #a;
select dt.dbupddate, dt.insdate, dt.map, dt.pda, pt.line, pt.item,
( pt.exp_qty - pt.imp_qty) as qty,
( pt.exp_value + pt.imp_value ) as netvalue,
( (document.exp_val - document.imp_val) * (pt.netvalue - pt.vat_value) ) as valueFromTran,
( (document.exp_val - document.imp_val) * ( ( (pt.qty - pt.qty_gift) * isnull(pt.price2,0.00) ) * (1.00-( pt.disc_perc / 100)) ) ) as valueFromPrice2,
( (document.exp_val - document.imp_val) * ( ( (pt.qty - pt.qty_gift) * isnull(pt.price3,0.00) ) * (1.00-( pt.disc_perc / 100)) ) ) as valueFromPrice3,
( (document.exp_val - document.imp_val) * ( ( (pt.qty - pt.qty_gift) * isnull(pt.price4,0.00) ) * (1.00-( pt.disc_perc / 100)) ) ) as valueFromPrice4,
dt.store, item.brand, item.cat1, item.cat2, item.cat3, customer.custgroup, customer.custgroup2, customer.custgroup3
into #a
from document with (nolock)
inner join dt with (nolock) on dt.doccode = document.code
inner join store with (nolock) on store.code = dt.store and store.calc_loyal = 1
inner join customer with (nolock) on customer.code = dt.customer
inner join pt with (nolock) on dt.map = pt.map and dt.pda=pt.pda
inner join item with (nolock) on item.code = pt.item and item.itemtype in (select code from itemtype with (nolock) where vsales = 1)
where dt.canceled = 0 and document.is_opposite = 0 and document.type = 3 and dt.customer=N'EL4444444'
and dt.insdate >= '20180109' and dt.insdate <= '20190108' ;
SET STATISTICS PROFILE ON
select t.store, t.map, t.pda, t.insdate, t.line, t.item, ld.tab_id,
case 4
when 1 then convert(bigint,round(isnull(t.valueFromTran,0.00) * ld.value , 0 ) )
when 2 then convert(bigint,round(isnull(t.netvalue,0.00) * ld.value , 0 ) )
when 3 then convert(bigint,isnull(t.qty,0) * ld.value )
when 4 then convert(bigint,round(isnull(t.valueFromPrice2,0.00) * ld.value , 0 ) )
when 5 then convert(bigint,round(isnull(t.valueFromPrice3,0.00) * ld.value , 0 ) )
when 6 then convert(bigint,round(isnull(t.valueFromPrice4,0.00) * ld.value , 0 ) )
else 0 end
,'003'
--select count(*)
from loyalty_policy_data ld with (index=ind_loyalty_policy_02)
inner join #a t on t.insdate >= ld.fdateactive and t.insdate <= ld.tdateactive
where ld.loyalty_policy = '003'
--and ld.tdateactive >= '20180109' and ld.fdateactive <= '20190108'
and t.dbupddate > '20000101'
and
case when 'CUSTOMER' is null then ''
else
case 'CUSTOMER'
when 'STORE' then t.store when 'BRAND' then t.brand when 'CAT1' then t.cat1 when 'CAT2' then t.cat2 when 'CAT3' then t.cat3 when 'ITEM' then t.item
when 'CUSTGROUP' then t.custgroup when 'CUSTGROUP2' then t.custgroup2 when 'CUSTGROUP3' then t.custgroup3
when 'CUSTOMER' then 'EL0134366'
else '' end
end
= case when 'CUSTOMER' is null then '' else ld.dim1 end
and
case when 'BRAND' is null then ''
else
case 'BRAND'
when 'STORE' then t.store when 'BRAND' then t.brand when 'CAT1' then t.cat1 when 'CAT2' then t.cat2 when 'CAT3' then t.cat3 when 'ITEM' then t.item
when 'CUSTGROUP' then t.custgroup when 'CUSTGROUP2' then t.custgroup2 when 'CUSTGROUP3' then t.custgroup3
when 'CUSTOMER' then 'EL0134366'
else '' end
end
= case when 'BRAND' is null then '' else ld.dim2 end
SET STATISTICS PROFILE off
And here is the execution plan. This runs a LOT faster.
Why this humongous difference? From my limited knowledge of execution analyzing, I have noticed
index spool
operation, has an estimated rows of ~9700 but actual rows of 3 million. Could you please explain this difference and propose some advice to fix the procedure?
Edit: As Laughing Vergil recommended, I replaced the literals in the second query with variables previously declared, and again it run slow!
Edit 2: I have some additional info from doing some further research.
First, I have isolated the problem to this line:
case when @tpdim1 is null then ''
<-- This uses the slow plan
case when 'CUSTOMER' is null then ''
<-- This uses the fast plan
This is true in the ad-hoc query, no need to trouble ourselves with spcs and/or cursors.
This keeps hapenning even if I change the code to the recommended dynamic where structure.
I haven't created any sampla data yet, but the important info (as can be seen in the plans) is that loyalty_policy_data
has about 720k lines if we only filter by loyalty_policy = @loop2_loyalty_policy
. However, if we evaluate the @tpdim1 condition, which essentially is dim1=N'EL0134366', the rows returned are only 4.
The difference in the plan, then, is when this condition is evaluated in regards to the date-checking conditions.
In the fast plan, it gets evaluated first - When seeking the index for the loyalty policy value, it adds a (non-seek) predicate. While this predicate is not within the index, the returned rows are 4 and all the other operators have "logical" sizes.
In contrast, the slow plan painfully disregards this predicate until too late. If I've figured correctly, it makes a nested loops on loyalty_policy_data as the outer table (which is crazy). It passes the needed columns in as outer references. For each such tuple, the index spool scans the #table(~1k rows) and finds about 250 results, and passes that to the filter which finally does the tpdim1 filtering. Thus, 250*700k rows are passed to the filter operator.
So now I think I know what happens. But I can't figure why.
Stored procedures are precompiled and optimised, which means that the query engine can execute them more rapidly. By contrast, queries in code must be parsed, compiled, and optimised at runtime. This all costs time.
When you run your query for the first time and the data is not in cache, the server read the data from disk. It is time-comsuming. The second time you execute the same query data is already in cache so it requires less time.
Conclusion. Overall, stored procedures outperform dynamic SQL. They are faster, easier to maintain, and require less network traffic. The rule of thumb would suggest using stored procedures in scenarios where you don't have to modify queries, and those queries are not very complex.
after cleaning up the query for readability purposes, I have the following.
insert into @tbl_loyal_loop2
( cnt,
store,
map,
pda,
insdate,
line,
item,
loyalty_policy_data,
loyal_calc,
loyalty_policy
)
select
@cnt,
t.store,
t.map,
t.pda,
t.insdate,
t.line,
t.item,
ld.tab_id,
convert(bigint, round( coalesce(
case @typecalc
when 1 then t.valueFromTran
when 2 then t.netvalue
when 3 then t.qty
when 4 then t.valueFromPrice2
when 5 then t.valueFromPrice3
when 6 then t.valueFromPrice4
else 0
END, 0.00) * ld.value , 0 ) ),
@loop2_loyalty_policy
from
loyalty_policy_data ld -- with (index=ind_loyalty_policy_02)
inner join #tbl_data t
on t.insdate >= ld.fdateactive
and t.insdate <= ld.tdateactive
where
ld.loyalty_policy = @loop2_loyalty_policy
and ld.tdateactive >= @from_rundate
and ld.fdateactive <= @to_rundate
and t.dbupddate > @loop1_dbupddate
and ( @tpdim1 is null
OR ld.dim1 = case @tpdim1
when 'STORE' then t.store
when 'BRAND' then t.brand
when 'CAT1' then t.cat1
when 'CAT2' then t.cat2
when 'CAT3' then t.cat3
when 'ITEM' then t.item
when 'CUSTGROUP' then t.custgroup
when 'CUSTGROUP2' then t.custgroup2
when 'CUSTGROUP3' then t.custgroup3
when 'CUSTOMER' then @customer
else ''
END )
and ( @tpdim2 is null
OR ld.dim2 = case when @tpdim1
when 'STORE' then t.store
when 'BRAND' then t.brand
when 'CAT1' then t.cat1
when 'CAT2' then t.cat2
when 'CAT3' then t.cat3
when 'ITEM' then t.item
when 'CUSTGROUP' then t.custgroup
when 'CUSTGROUP2' then t.custgroup2
when 'CUSTGROUP3' then t.custgroup3
when 'CUSTOMER' then @customer
else ''
END )
In addition, I would make sure you have a composite index on your loyalty_policy_data table... index on ( loyalty_policy, tdateactive, fdateactive, dbupddate, dim1, dim2 )
This way you are qualifying all the fields used in your WHERE filtering criteria. Don't rely on an index of just the key... but key PLUS the dates will help optimize the specific date range without having to go back to the raw data pages, but can optimize the query JOIN conditions based on values in the INDEX.
As for your temporary table #tbl_data, ensure you have an index on ( insdate ) as that is the only JOIN basis criteria (in case you did not already have an index on that table).
COMMENT --
From your comment about the slow vs fast query based on the null of
@tpdim1 = NULL vs 'CUSTOMER' = NULL
a fixed string 'CUSTOMER' is NEVER null, so it never has to consider it down the null path. Fixed string 'CUSTOMER' vs the @customer variable being null or the being compared to in the case/when of ld.dim1 and ld.dim2 respectively being compared to null... maybe what needs to be tested for should be changed from
and ( @tpdim1 is null
OR ld.dim1 = case @tpdim1
when 'STORE' then t.store
when 'BRAND' then t.brand ... end
)
to
and ld.dim1 = case @tpdim1
when NULL then ''
when 'STORE' then t.store
when 'BRAND' then t.brand ... end
Same with the ld.dim2 case/when. Include the "NULL" as first tested value for the @tpdim1 (and @tpdim2) tests.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With