Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sql Server Change Data Capture: Preserving history when adding columns?

Tags:

sql-server

cdc

When a new column is added to table that is configured for change data capture (cdc), the capture instance table will not have the new column until cdc is disabled and re-enabled for the source table. In the process the existing capture instance is dropped.

I thought I could copy existing data out to a temp table and then copy back using the following SQL. However, other CDC meta information, such as the cdc.change_tables.start_lsn, becomes invalid.

How can the capture instance history be preserved, using the same capture instance name, if at all?

Thanks, Rich

/*Change Data Capture Test - Alter table definition test */

/*Enter restricted mode so we don't lose data changes during this process*/
alter database ChangeDataCaptureTest set AUTO_UPDATE_STATISTICS_ASYNC OFF
alter database ChangeDataCaptureTest set RESTRICTED_USER with ROLLBACK IMMEDIATE
go

/*Add a column to the table*/
alter table dbo.Table1 add value3 varchar(20) DEFAULT '' not null

/*Copy the existing change tracking into a temp table*/
select * into cdc.dbo_Table1_temp from cdc.dbo_Table1_CT

/*Add the new column to the temp table so that we don't have to map
all columns when we copy back, note that we use NULL as the default*/
alter table cdc.dbo_Table1_temp add value3 varchar(20) DEFAULT NULL

/*Disable CDC on the source table, this will drop the associated cdc table*/
exec sys.sp_cdc_disable_table 
@source_schema='dbo',
@source_name='Table1', 
@capture_instance='dbo_Table1'

/*Enable CDC for the table which recreates the CDC table*/
EXEC sys.sp_cdc_enable_table
@source_schema = N'dbo',
@source_name   = N'Table1',
@role_name     = NULL,
@supports_net_changes = 1,
@filegroup_name = N'ChangeDataCapture'
GO

/*Insert values from the temp table back into the new CDC Table*/
Insert into cdc.dbo_Table1_CT 
SELECT * 
From cdc.dbo_Table1_temp
go

/*Drop the temp table*/
drop table cdc.dbo_Table1_temp

/*Go back into multi-user mode*/
alter database ChangeDataCaptureTest set AUTO_UPDATE_STATISTICS_ASYNC ON
alter database ChangeDataCaptureTest set MULTI_USER
go

/*Add a new row to the table*/
insert into table1
values(12,'zz','g')
like image 931
Richard Collette Avatar asked Apr 16 '10 17:04

Richard Collette


People also ask

Does CDC capture schema changes?

The cdc schema contains the change data capture metadata tables and, after source tables are enabled for change data capture, the individual change tables serve as a repository for change data. The cdc schema also contains associated system functions used to query for change data.

Does Change Data Capture affect performance?

Trigger-based CDC can impact the performance of the source database because triggers run on the database tables as data changes are made. With every transaction, it takes compute cycles to record the change in a separate table, so the system is slowed by this extra processing.

How do I disable CDC?

To disable CDC tracking for a table, use the sys. sp_cdc_disable_table stored procedure. Specify a capture instance to disable it.

How do you edit data capture?

Change Data Capture (CDC) can be implemented using the following 3 steps: Step 1: Extract the Data. Step 2: Transform the Data. Step 3: Load the Data.


1 Answers

Rich,

The best method to preserve such data is to create a staging persisted table to capture the _CT table data periodically. Knowing that cdc data generally has a short shelf-life prior to be consumed by the endpoint (warehouse/data mart, etc.) you can ensure that any changes are completed within a maintenance window at which time the _CT table data is copied off into staging prior to the changes being implemented.

The one aspect to consider in this is that once the _CT schema has been changed (by adding or removing one or more columns) the process used to pull that data out into the endpoint must also be updated.

To overcome this we implemented a script store that stores the intended schema of the staging table (used between _CT and endpoint) and once the changes are implemented on the client DB, then we move the data from staging into endpoint and update the staging schema.

Hopefully this will provide food for thought.

like image 92
LogicalMan Avatar answered Sep 22 '22 23:09

LogicalMan