I have a situation where I need to configure existing client data to address a problem where our application was not correctly updating IDs in a table when it should have been.
Here's the scenario. We have a parent table, where rows can be inserted that effectively replace existing rows; the replacement can be recursive. We also have a child table, which has a field that points to the parent table. In existing data, the child table could be pointing at rows that have been replaced, and I need to correct that. I can't simply update each row to the replacing row, however, because that row could have been replaced as well, and I need the latest row to be reflected.
I was trying to find a way to write a CTE that would accomplish this for me, but I'm struggling to find a query that finds what I'm actually looking for. Here's a sample of the tables that I'm working with; the 'ShouldBe' column is what I'd like my update query to end up with, taking into account the recursive replacement of some of the rows.
DECLARE @parent TABLE (SampleID int,
SampleIDReplace int,
GroupID char(1))
INSERT INTO @parent (SampleID, SampleIDReplace, GroupID)
VALUES (1, -1, 'A'), (2, 1, 'A'), (3, -1, 'A'),
(4, -1, 'A'), (5, 4, 'A'), (6, 5, 'A'),
(7, -1, 'B'), (8, 7, 'B'), (9, 8, 'B')
DECLARE @child TABLE (ChildID int, ParentID int)
INSERT INTO @child (ChildID, ParentID)
VALUES (1, 4), (2, 7), (3, 1), (4, 3)
Desired results in child table, after the update script has been applied:
ChildID ParentID ParentID_ShouldBe
1 4 6 (4 replaced by 5, 5 replaced by 6)
2 7 9 (7 replaced by 8, 8 replaced by 9)
3 1 2 (1 replaced by 2)
4 3 3 (unchanged, never replaced)
A recursive CTE references itself. It returns the result subset, then it repeatedly (recursively) references itself, and stops when it returns all the results.
Recursion is achieved by WITH statement, in SQL jargon called Common Table Expression (CTE). It allows to name the result and reference it within other queries sometime later. Naming the result and referencing it within other queries.
A recursive CTE is one that references itself within that CTE. The recursive CTE is useful when working with hierarchical data as the CTE continues to execute until the query returns the entire hierarchy. A typical example of hierarchical data is a table that includes a list of employees.
A recursive association connects a single class type (serving in one role) to itself (serving in another role). Example: In most companies, each employee (except the CEO) is supervised by one manager. Of course, not all employees are managers.
The following returns what you are looking for:
with cte as (
select sampleid, sampleidreplace, 1 as num
from @parent
where sampleidreplace <> -1
union all
select p.sampleid, cte.sampleidreplace, cte.num+1
from @parent p join
cte
on p.sampleidreplace = cte.sampleId
)
select c.*, coalesce(p.sampleid, c.parentid)
from @child c left outer join
(select ROW_NUMBER() over (partition by sampleidreplace order by num desc) as seqnum, *
from cte
) p
on c.ParentID = p.SampleIDReplace and p.seqnum = 1
The recursive part keeps track of every correspondence (4-->5, 4-->6). The addition number is a "generation" count. We actually want the last generation. This is identified by using the row_number()
function, ordering by the num in decreasing order -- hence the p.seqnum = 1
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With