I have a table containing stages and sub-stages of certain projects, and a table with specific tasks and estimated costs.
I need some way to aggregate each level (stages/sub-stages), to see how much it costs, but to do it at a minimum performance cost.
To illustrate this, I will use the following data structure:
CREATE TABLE stage
(
id int not null,
fk_parent int
)
CREATE TABLE task
(
id int not null,
fk_stage int not null,
cost decimal(18,2) not null default 0
)
with the following data:
==stage==
id fk_parent
1 null
2 1
3 1
==task==
id fk_stage cost
1 2 100
1 2 200
1 3 600
I want to obtain a table containing the total costs on each branch. Something like this:
Stage ID Total Cost
1 900
2 300
3 600
But, I also want it to be productive. I don't want to end up with extremely bad solutions like The worst algorithm in the world. I mean this is the case. In case I'll request the data for all the items in the stage
table, with the total costs, each total cost will be evaluated D
times, where D
is the depth in the tree (level) at which it is situated. I am afraid I'll hit extremely low performances at large amounts of data with a lot of levels.
SO,
I decided to do something which made me ask this question here.
I decided to add 2 more columns to the stage
table, for caching.
...
calculated_cost decimal(18,2),
date_calculated_cost datetime
...
So what I wanted to do is pass another variable within the code, a datetime
value which equals to the time when this process was started (pretty much unique). That way, if the stage
row already has a date_calculated_cost
which equals to the one I'm carrying, I don't bother calculating it again, and just return the calculated_cost
value.
I couldn't do it with Functions (updates are needed to the stage
table, once costs are calculated)
I couldn't do it with Procedures (recursion within running cursors is a no-go)
I am not sure temporary tables are suitable because it wouldn't allow concurrent requests to the same procedure (which are least likely, but anyway I want to do it the right way)
I couldn't figure out other ways.
I am not expecting a definite answer to my question, but I will reward any good idea, and the best will be chosen as the answer.
1. A way to query the tables to get the aggregated cost.
The first three steps is combined to one statement. It might be good for performance to do the first calculation, cteCost
, to a temp table of it's own and use that temp table in the recursive cteLevel
.
;with cteCost as
(
select s.id,
s.fk_parent,
isnull(sum(t.cost), 0) as cost
from stage as s
left outer join task as t
on s.id = t.fk_stage
group by s.id, s.fk_parent
),
cteLevel as
(
select cc.id,
cc.fk_parent,
cc.cost,
1 as lvl
from cteCost as cc
where cc.fk_parent is null
union all
select cc.id,
cc.fk_parent,
cc.cost,
lvl+1
from cteCost as cc
inner join cteLevel as cl
on cc.fk_parent = cl.id
)
select *
into #task
from cteLevel
create clustered index IX_id on #task (id)
create index IX_lvl on #task (lvl, fk_parent)
declare @lvl int
select @lvl = max(lvl)
from #task
while @lvl > 0
begin
update T1 set
T1.cost = T1.cost + T2.cost
from #task as T1
inner join (select fk_parent, sum(cost) as cost
from #task
where lvl = @lvl
group by fk_parent) as T2
on T1.id = T2.fk_parent
set @lvl = @lvl - 1
end
select id as [Stage ID],
cost as [Total Cost]
from #task
drop table #task
2. A trigger on table task
that maintains a calculated_cost
field in stage
.
create trigger tr_task
on task
after insert, update, delete
as
-- Table to hold the updates
declare @T table
(
id int not null,
cost decimal(18,2) not null default 0
)
-- Get the updates from inserted and deleted tables
insert into @T (id, cost)
select fk_stage, sum(cost)
from (
select fk_stage, cost
from inserted
union all
select fk_stage, -cost
from deleted
) as T
group by fk_stage
declare @id int
select @id = min(id)
from @T
-- For each updated row
while @id is not null
begin
-- Recursive update of stage
with cte as
(
select s.id,
s.fk_parent
from stage as s
where id = @id
union all
select s.id,
s.fk_parent
from stage as s
inner join cte as c
on s.id = c.fk_parent
)
update s set
calculated_cost = s.calculated_cost + t.cost
from stage as s
inner join cte as c
on s.id = c.id
cross apply (select cost
from @T
where id = @id) as t
-- Get the next id
select @id = min(id)
from @T
where id > @id
end
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With