I have a SQL Server database with some audit records showing changes to a third party database (OpenEdge). I have no control over the structure of the audit data, nor the way the third party database audits data changes. So I'm left with, for example, the following data...
If you follow the first five rows you can see they all belong to TransId 1532102 (represents a database transaction) where the TransSeq represents a database action within a single transaction.
In the columns prefix New
the audit changes are visible. If the value is NULL then no change to that field took place.
Looking at the data you can see that where TransId = 1532102 the PrimaryIdentifier is changed from 2 to -2 (row 1), then from -2 to 3 (row 3), then from 3 to 4 (row 4) and finally from 4 to 5 (row 5). You might also notice that when the PrimaryIdentifier changes from 3 to 4 the SecondaryIdentifier changes from 'abcd' to 'efgh' (row 4). So these multiple changes are actually only occurring on a single source record. So with this in mind rows 1, 3, 4 & 5 can all be condensed into a single row (see below)
Ultimately there are only two record changes in TransId 1532102..
I need to translate these changes into a single UPDATE statement on a target database. In order to do this I need to ensure I have a single record showing the before and after values.
So given the source data presented here I need to produce the following data set..
What query structures could I use to achieve this? I was thinking recursive CTEs or perhaps using Hierarchical structures? Ultimately I need this to perform as well as possible so I wanted to pose the question here in case I hadn't considered all possible approaches.
Thoughts welcome and here's a script for the sample data
DECLARE @TestTable TABLE (SyncId INT, TransId INT, TransSeq INT, PrimaryIdentifier INT, SecondaryIdentifier NCHAR(4), NewPrimaryIdentifier INT, NewSecondaryIdentifier NCHAR(4), NewLevel INT, NewValue NVARCHAR(20))
INSERT @TestTable
SELECT 128, 1532102, 0, 2, 'abcd', -2, NULL, NULL, 'test data'
UNION SELECT 128, 1532102, 1, 3, 'abcd', 2, NULL, NULL, NULL
UNION SELECT 128, 1532102, 2, -2, 'abcd', 3, NULL, NULL, NULL
UNION SELECT 128, 1532102, 3, 3, 'abcd', 4, 'efgh', NULL, NULL
UNION SELECT 128, 1532102, 4, 4, 'efgh', 5, NULL, 2, NULL
UNION SELECT 128, 1532102, 5, 5, 'efgh', NULL, 'ghfi', NULL, NULL
UNION SELECT 128, 1532106, 0, 3, 'abcd', -3, NULL, NULL, NULL
UNION SELECT 128, 1532106, 1, 4, 'abcd', 3, NULL, NULL, NULL
UNION SELECT 128, 1532106, 2, -3, 'abcd', 4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 0, 4, 'abcd', -4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 1, 5, 'abcd', 4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 2, -4, 'abcd', 5, NULL, NULL, NULL
UNION SELECT 128, 1532114, 0, 5, 'abcd', -5, NULL, NULL, NULL
UNION SELECT 128, 1532114, 1, 4, 'abcd', 5, NULL, 1, NULL
UNION SELECT 128, 1532114, 2, -5, 'abcd', 4, NULL, NULL, 'some more test data'
SELECT *
FROM @TestTable
EDIT: I've actually been unable to write any queries that successfully track the identifier changes. Can anyone help - I need a query that tracks the changes in PrimaryIdentifier values and ultimately provides a single record for each tracking with start values and end values.
EDIT 2: There's been a deleted answer that suggests the update to the key identifiers is not possible when condensed and that I should step through the changes instead. I thought it would be valuable to add my comments for further info to the question..
I need to condense the dataset because of the volume of audit records being generated; most of which are unecessary because of the way the source DBMS makes its changes. I need to reduce the dataset and I need to track key identifier changes. The update should be possible without clashing on id change during the update statement - see this example.
You can retrieve data faster and optimize SQL queries by using clustered and non-clustered SQL Server indexes. Indexes can reduce runtime, but it's also important to consider how much disk space they require. In Microsoft SQL Server, you won't need additional disk space for your clustered indexes.
Best practices to improve SQL update statement performance We need to consider the lock escalation mode of the modified table to minimize the usage of too many resources. Analyzing the execution plan may help to resolve performance bottlenecks of the update query. We can remove the redundant indexes on the table.
To speed up frequent sorts, use an int (or an integer-based) data type if possible. SQL Server sorts integer data faster than character data.
I assume that
1) (PrimaryIdentifier, SecondaryIdentifier)
is a PK of the target table,
2) Every transacton in the audit table leaves target table in a consistent state.
So the update of the PK in a single statement for every transaction using case
will run OK:
declare @t table (id int primary key, old int);
insert @t(id, old) values (4,4),(5,5);
update @t set id = case id
when 4 then 5
when 5 then 4 end;
select * from @t;
The plan is 1. Condense transactions 2. Generate update sql into temp table. Then you can run all or selected items from the temp table. Every item is of the form
UPDATE myTable SET
PrimaryIdentifier = CASE WHEN PrimaryIdentifier=2 AND SecondaryIdentifier='abcd' THEN 5
WHEN PrimaryIdentifier=3 AND SecondaryIdentifier='abcd' THEN 2 END,
SecondaryIdentifier = CASE WHEN PrimaryIdentifier=2 AND SecondaryIdentifier='abcd' THEN 'efgh'
WHEN PrimaryIdentifier=3 AND SecondaryIdentifier='abcd' THEN 'abcd' END ,
Level= CASE WHEN PrimaryIdentifier=2 AND SecondaryIdentifier='abcd' THEN 2
WHEN PrimaryIdentifier=3 AND SecondaryIdentifier='abcd' THEN Level END ,
Value= CASE WHEN PrimaryIdentifier=2 AND SecondaryIdentifier='abcd' THEN 'test data'
WHEN PrimaryIdentifier=3 AND SecondaryIdentifier='abcd' THEN Value END
WHERE 1=2 OR (PrimaryIdentifier=2 AND SecondaryIdentifier='abcd')
OR (PrimaryIdentifier=3 AND SecondaryIdentifier='abcd')
The query
DECLARE @TestTable TABLE (SyncId INT, TransId INT, TransSeq INT, PrimaryIdentifier INT, SecondaryIdentifier NCHAR(4), NewPrimaryIdentifier INT, NewSecondaryIdentifier NCHAR(4), NewLevel INT, NewValue NVARCHAR(20))
INSERT @TestTable
SELECT 128, 1532102, 0, 2, 'abcd', -2, NULL, NULL, 'test data'
UNION SELECT 128, 1532102, 1, 3, 'abcd', 2, NULL, NULL, NULL
UNION SELECT 128, 1532102, 2, -2, 'abcd', 3, NULL, NULL, NULL
UNION SELECT 128, 1532102, 3, 3, 'abcd', 4, 'efgh', NULL, NULL
UNION SELECT 128, 1532102, 4, 4, 'efgh', 5, NULL, 2, NULL
UNION SELECT 128, 1532106, 0, 3, 'abcd', -3, NULL, NULL, NULL
UNION SELECT 128, 1532106, 1, 4, 'abcd', 3, NULL, NULL, NULL
UNION SELECT 128, 1532106, 2, -3, 'abcd', 4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 0, 4, 'abcd', -4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 1, 5, 'abcd', 4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 2, -4, 'abcd', 5, NULL, NULL, NULL
UNION SELECT 128, 1532110, 3, 5, 'abcd', 6, NULL, NULL, NULL
UNION SELECT 128, 1532110, 4, 6, 'abcd', 5, NULL, NULL, NULL
UNION SELECT 128, 1532114, 0, 5, 'abcd', -5, NULL, NULL, NULL
UNION SELECT 128, 1532114, 1, 4, 'abcd', 5, NULL, 1, NULL
UNION SELECT 128, 1532114, 2, -5, 'abcd', 4, NULL, NULL, 'some more test data'
;
WITH root AS (
-- Top parent updates within transactions
SELECT SyncId, TransId, TransSeq, PrimaryIdentifier AS rPrimaryIdentifier, SecondaryIdentifier AS rSecondaryIdentifier,
NewPrimaryIdentifier,
coalesce(NewSecondaryIdentifier, SecondaryIdentifier) AS NewSecondaryIdentifier,
newLevel, NewValue
FROM @TestTable t
WHERE NOT EXISTS (SELECT 1
FROM @TestTable t2
WHERE t2.SyncId=t.SyncId AND t2.TransId = t.TransId
AND t2.TransSeq < t.TransSeq
AND t.PrimaryIdentifier = t2.NewPrimaryIdentifier
AND t.SecondaryIdentifier = coalesce(t2.NewSecondaryIdentifier, t2.SecondaryIdentifier)
)
-- recursion to track the chain of updates
UNION ALL
SELECT root.SyncId, root.TransId, t.TransSeq, rPrimaryIdentifier, rSecondaryIdentifier,
t.NewPrimaryIdentifier,
coalesce(t.NewSecondaryIdentifier, root.NewSecondaryIdentifier),
coalesce(root.NewLevel, t.NewLevel), coalesce(root.NewValue, t.NewValue)
FROM root
JOIN @TestTable t ON root.SyncId=t.SyncId AND root.TransId = t.TransId
AND root.TransSeq < t.TransSeq
AND t.PrimaryIdentifier = root.NewPrimaryIdentifier
AND t.SecondaryIdentifier = root.NewSecondaryIdentifier
)
,condensed as (
-- last update in the chain
SELECT TOP(1) WITH TIES *
FROM root
ORDER BY row_number() over (partition by SyncId, TransId, rPrimaryIdentifier, rSecondaryIdentifier
order by TransSeq desc)
)
-- generate sql
SELECT SyncId, TransId, sql = 'UPDATE myTable SET PrimaryIdentifier = CASE'
+ (SELECT ' WHEN PrimaryIdentifier='+ CAST(rPrimaryIdentifier as varchar(20))
+' AND SecondaryIdentifier=''' + rSecondaryIdentifier
+''' THEN ' + CAST(NewPrimaryIdentifier as varchar(20))
FROM condensed c2
WHERE c1.SyncId = c2.SyncId AND c1.TransId= c2.TransId
FOR XML PATH('') )
+ ' END, SecondaryIdentifier = CASE'
+ (SELECT ' WHEN PrimaryIdentifier='+ CAST(rPrimaryIdentifier as varchar(20))
+' AND SecondaryIdentifier=''' + rSecondaryIdentifier
+''' THEN ''' + NewSecondaryIdentifier + ''''
FROM condensed c2
WHERE c1.SyncId = c2.SyncId AND c1.TransId= c2.TransId
FOR XML PATH('') )
+ ' END , Level= CASE'
+ (SELECT ' WHEN PrimaryIdentifier='+ CAST(rPrimaryIdentifier as varchar(20))
+' AND SecondaryIdentifier=''' + rSecondaryIdentifier
+''' THEN '
+ CASE WHEN NewLevel IS NULL THEN ' Level ' ELSE CAST(NewLevel as varchar(20)) END
FROM condensed c2
WHERE c1.SyncId = c2.SyncId AND c1.TransId= c2.TransId
FOR XML PATH('') )
+ ' END , Value= CASE'
+ (SELECT ' WHEN PrimaryIdentifier='+ CAST(rPrimaryIdentifier as varchar(20))
+' AND SecondaryIdentifier=''' + rSecondaryIdentifier
+''' THEN '
+ CASE WHEN NewValue IS NULL THEN ' Value ' ELSE '''' + NewValue + '''' END
FROM condensed c2
WHERE c1.SyncId = c2.SyncId AND c1.TransId= c2.TransId
FOR XML PATH('') )
+ ' END'
+ ' WHERE 1=2'
+ (SELECT ' OR (PrimaryIdentifier='+ CAST(rPrimaryIdentifier as varchar(20))
+' AND SecondaryIdentifier=''' + rSecondaryIdentifier +''')'
FROM condensed c2
WHERE c1.SyncId = c2.SyncId AND c1.TransId= c2.TransId
FOR XML PATH('') )
INTO #UpdSql
FROM condensed c1
GROUP BY SyncId, TransId
SELECT *
FROM #UpdSql
ORDER BY SyncId, TransId
EDIT
Taking into account NewPrimaryIdentifier
can be NULL too. See added row at @TestTable. Sql generation skipped.
DECLARE @TestTable TABLE (SyncId INT, TransId INT, TransSeq INT, PrimaryIdentifier INT, SecondaryIdentifier NCHAR(4), NewPrimaryIdentifier INT, NewSecondaryIdentifier NCHAR(4), NewLevel INT, NewValue NVARCHAR(20))
INSERT @TestTable
SELECT 128, 1532102, 0, 2, 'abcd', -2, NULL, NULL, 'test data'
UNION SELECT 128, 1532102, 1, 3, 'abcd', 2, NULL, NULL, NULL
UNION SELECT 128, 1532102, 2, -2, 'abcd', 3, NULL, NULL, NULL
UNION SELECT 128, 1532102, 3, 3, 'abcd', 4, 'efgh', NULL, NULL
UNION SELECT 128, 1532102, 4, 4, 'efgh', 5, NULL, 2, NULL
UNION SELECT 128, 1532102, 5, 5, 'efgh', null, 'ghfi', null, NULL -- added
UNION SELECT 128, 1532106, 0, 3, 'abcd', -3, NULL, NULL, NULL
UNION SELECT 128, 1532106, 1, 4, 'abcd', 3, NULL, NULL, NULL
UNION SELECT 128, 1532106, 2, -3, 'abcd', 4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 0, 4, 'abcd', -4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 1, 5, 'abcd', 4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 2, -4, 'abcd', 5, NULL, NULL, NULL
UNION SELECT 128, 1532110, 3, 5, 'abcd', 6, NULL, NULL, NULL
UNION SELECT 128, 1532110, 4, 6, 'abcd', 5, NULL, NULL, NULL
UNION SELECT 128, 1532114, 0, 5, 'abcd', -5, NULL, NULL, NULL
UNION SELECT 128, 1532114, 1, 4, 'abcd', 5, NULL, 1, NULL
UNION SELECT 128, 1532114, 2, -5, 'abcd', 4, NULL, NULL, 'some more test data'
;
WITH root AS (
-- Top parent updates within transactions
SELECT SyncId, TransId, TransSeq, PrimaryIdentifier AS rPrimaryIdentifier, SecondaryIdentifier AS rSecondaryIdentifier,
coalesce(NewPrimaryIdentifier, PrimaryIdentifier) AS NewPrimaryIdentifier,
coalesce(NewSecondaryIdentifier, SecondaryIdentifier) AS NewSecondaryIdentifier,
newLevel, NewValue
FROM @TestTable t
WHERE NOT EXISTS (SELECT 1
FROM @TestTable t2
WHERE t2.SyncId=t.SyncId AND t2.TransId = t.TransId
AND t2.TransSeq < t.TransSeq
AND t.PrimaryIdentifier = coalesce(t2.NewPrimaryIdentifier, t2.PrimaryIdentifier)
AND t.SecondaryIdentifier = coalesce(t2.NewSecondaryIdentifier, t2.SecondaryIdentifier)
)
-- recursion to track the chain of updates
UNION ALL
SELECT root.SyncId, root.TransId, t.TransSeq, rPrimaryIdentifier, rSecondaryIdentifier,
coalesce(t.NewPrimaryIdentifier, root.NewPrimaryIdentifier),
coalesce(t.NewSecondaryIdentifier, root.NewSecondaryIdentifier),
coalesce(t.NewLevel, root.NewLevel), coalesce(t.NewValue, root.NewValue)
FROM root
JOIN @TestTable t ON root.SyncId=t.SyncId AND root.TransId = t.TransId
AND root.TransSeq < t.TransSeq
AND t.PrimaryIdentifier = root.NewPrimaryIdentifier
AND t.SecondaryIdentifier = root.NewSecondaryIdentifier
)
,condensed as (
-- last update in the chain
SELECT TOP(1) WITH TIES *
FROM root
ORDER BY row_number() over (partition by SyncId, TransId, rPrimaryIdentifier, rSecondaryIdentifier
order by TransSeq desc)
)
SELECT *
FROM condensed
ORDER BY SyncId, TransId, rPrimaryIdentifier, rSecondaryIdentifier
Here is a second stab at producing the originally asked for output. This time using a bunch of CTE:s.
DECLARE @TestTable TABLE (SyncId INT, TransId INT, TransSeq INT, PrimaryIdentifier INT, SecondaryIdentifier NCHAR(4), NewPrimaryIdentifier INT, NewSecondaryIdentifier NCHAR(4), NewLevel INT, NewValue NVARCHAR(20))
INSERT @TestTable
SELECT 128, 1532102, 0, 2, 'abcd', -2, NULL, NULL, 'test data'
UNION SELECT 128, 1532102, 1, 3, 'abcd', 2, NULL, NULL, NULL
UNION SELECT 128, 1532102, 2, -2, 'abcd', 3, NULL, NULL, NULL
UNION SELECT 128, 1532102, 3, 3, 'abcd', 4, 'efgh', NULL, NULL
UNION SELECT 128, 1532102, 4, 4, 'efgh', 5, NULL, 2, NULL
UNION SELECT 128, 1532106, 0, 3, 'abcd', -3, NULL, NULL, NULL
UNION SELECT 128, 1532106, 1, 4, 'abcd', 3, NULL, NULL, NULL
UNION SELECT 128, 1532106, 2, -3, 'abcd', 4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 0, 4, 'abcd', -4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 1, 5, 'abcd', 4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 2, -4, 'abcd', 5, NULL, NULL, NULL
UNION SELECT 128, 1532110, 3, 5, 'abcd', 6, NULL, NULL, NULL
UNION SELECT 128, 1532110, 4, 6, 'abcd', 5, NULL, NULL, NULL
UNION SELECT 128, 1532114, 0, 5, 'abcd', -5, NULL, NULL, NULL
UNION SELECT 128, 1532114, 1, 4, 'abcd', 5, NULL, 1, NULL
UNION SELECT 128, 1532114, 2, -5, 'abcd', 4, NULL, NULL, 'some more test data'
;with baseCTE as (
select SyncId, TransId, TransSeq, PrimaryIdentifier, SecondaryIdentifier,
isnull(NewPrimaryIdentifier, PrimaryIdentifier) as NewPrimaryIdentifier,
isnull(NewSecondaryIdentifier, SecondaryIdentifier) as NewSecondaryIdentifier,
NewLevel, NewValue
from @TestTable
),
syncTransEntryPointsCte as (
select *
from baseCTE b
where not exists(
select *
from baseCTE subb
where b.SyncId = subb.SyncId
and b.TransId = subb.TransId
and b.PrimaryIdentifier = subb.NewPrimaryIdentifier
and b.SecondaryIdentifier = subb.NewSecondaryIdentifier
and b.TransSeq > subb.TransSeq
)
)
, recursiveBaseCte as (
select *, 0 as lev, TransSeq as OrigTransSec from syncTransEntryPointsCte
union all
select
c.SyncId, c.TransId, c.TransSeq, p.PrimaryIdentifier, p.SecondaryIdentifier, c.NewPrimaryIdentifier, c.NewSecondaryIdentifier, isnull(c.NewLevel, p.NewLevel), isnull(c.NewValue, p.NewValue),
p.lev + 1,
p.OrigTransSec
from baseCTE c
join recursiveBaseCte as p on (
c.SyncId = p.SyncId and c.TransId = p.TransId and c.PrimaryIdentifier = p.NewPrimaryIdentifier and c.SecondaryIdentifier = p.NewSecondaryIdentifier and c.TransSeq > p.TransSeq
)
)
select r.SyncId, r.TransId, r.OrigTransSec as TransSec,
r.PrimaryIdentifier, r.SecondaryIdentifier,
nullif(r.NewPrimaryIdentifier, r.PrimaryIdentifier) as NewPrimaryIdentifier,
nullif(r.NewSecondaryIdentifier, r.SecondaryIdentifier) as NewSecondaryIdentifier,
r.NewLevel, r.NewValue
from recursiveBaseCte r
join (
select SyncId, TransId, PrimaryIdentifier, SecondaryIdentifier, max(lev) as mlev
from recursiveBaseCte
group by SyncId, TransId, PrimaryIdentifier, SecondaryIdentifier
) as selectForOutput on
r.SyncId = selectForOutput.SyncId
and r.TransId = selectForOutput.TransId
and r.PrimaryIdentifier = selectForOutput.PrimaryIdentifier
and r.SecondaryIdentifier = selectForOutput.SecondaryIdentifier
and r.lev = selectForOutput.mlev
order by 1,2,3
Whether or not the CTE approach is any faster than the cursor based one is difficult to guess. I do suggest you test run this at a suitable time when the server in question is not under heavy load.
Update
The script first declares the baseCTE
which is used just to make sure that we have values in NewPrimaryIdentifier
and NewSecondaryIdentifier
for each row, even if one or both of them were not changed in the update. This makes everything after that easier since we can then join to the next row for the same combination within a specific transaction.
The syncTransEntryPointCte
in turn uses baseCTE
to find all rows within one transaction that were not preceded by another row within the same transaction.
recursiveBaseCte
then uses both of the previous CTE:s to recursively find rows and aggregate changes. The final query then uses it to produce the final output.
The output should be usable for updating a stale copy of the source table if you can manage to do the updates for one condensed transaction in one update statement. If, as I originally assumed, you try to build one update statement for each row in the condensed audit output, it will not work.
Finally, obligatory disclaimer: This seems to work with the test data you gave in the question. I can give no guarantees that it works for the real thing, so use with caution.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With