Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use the pre-copy script from the copy activity to remove records in the sink based on the change tracking table from the source?

I am trying to use change tracking to copy data incrementally from a SQL Server to an Azure SQL Database. I followed the tutorial on Microsoft Azure documentation but I ran into some problems when implementing this for a large number of tables.

In the source part of the copy activity I can use a query that gives me a change table of all the records that are updated, inserted or deleted since the last change tracking version. This table will look something like

PersonID   Age    Name   SYS_CHANGE_OPERATION
---------------------------------------------
1          12     John   U
2          15     James  U
3          NULL   NULL   D
4          25     Jane   I

with PersonID being the primary key for this table.

The problem is that the copy activity can only append the data to the Azure SQL Database so when a record gets updated it gives an error because of a duplicate primary key. I can deal with this problem by letting the copy activity use a stored procedure that merges the data into the table on the Azure SQL Database, but the problem is that I have a large number of tables.

I would like the pre-copy script to delete the deleted and updated records on the Azure SQL Database, but I can't figure out how to do this. Do I need to create separate stored procedures and corresponding table types for each table that I want to copy or is there a way for the pre-copy script to delete records based on the change tracking table?

like image 847
Werner Jongenelis Avatar asked May 30 '18 15:05

Werner Jongenelis


1 Answers

You have to use a LookUp activity before the Copy Activity. With that LookUp activity you can query the database so that you get the deleted and updated PersonIDs, preferably all in one field, separated by comma (so its easier to use in the pre-copy script). More information here: https://learn.microsoft.com/en-us/azure/data-factory/control-flow-lookup-activity

Then you can do the following in your pre-copy script:

delete from TableName where PersonID in (@{activity('MyLookUp').output.firstRow.PersonIDs})

This way you will be deleting all the deleted or updated rows before inserting the new ones.

Hope this helped!

like image 108
Martin Esteban Zurita Avatar answered Nov 05 '22 12:11

Martin Esteban Zurita