Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I clean up the SSISDB?

When I set this up I overlooked the retention period. My database has become pretty large so I want to decrease it's size. If I simply change the retention period (it was 365) it causes issues with SSIS running my packages. I even changed it in small increments but the deletion statement would create locks which would prevent new jobs from running.

Any ideas how to work around this? I've thought about just creating a new SSISDB.

like image 937
Paul Riker Avatar asked Feb 14 '14 13:02

Paul Riker


People also ask

Does Ssisdb need full recovery?

The default recover model of the SSISDB database is the Full Recovery Model. This means the transaction log will keep growing until the disk is full or until a transaction log backup is taken and the log is truncated.


2 Answers

Phil Brammer ran into this and a host of other things related to the care and feeding of the SSIS catalog, which he covers on his post Catalog Indexing Recommendations.

Root problem

The root problem is that MS attempted to design the SSIS with RI in mind but they were lazy and allowed the cascading deletes to happen versus explicitly handling them.

Out of the box, the new SSIS 2012 catalog database (SSISDB) has some basic indexing applied, with referential integrity set to do cascade deletes between most tables.

Enter the SQL Agent job, “SSIS Server Maintenance Job.” This job by default is set to run at midnight daily, and uses two catalog parameters to function: “Clean Logs Periodically” and “Retention Period (days).” When these are set, the maintenance job purges any data outside of the noted retention period.

This maintenance job deletes, 10 records at a time in a loop, from internal.operations and then cascades into many tables downstream. In our case, we have around 3000 operations records to delete daily (10 at a time!) that translates into 1.6 million rows from internal.operation_messages. That’s just one downstream table! This entire process completely, utterly locks up the SSISDB database from any SELECT/INSERT data

Resolution

Until MS changes up how things work, the supported option is

move the maintenance job schedule to a more appropriate time for your environment

I know at my current client, we only load data in the wee hours so the SSISDB is quiet during business hours.

If running the maintenance job during a quiet period isn't an option, then you're looking at crafting your own delete statements to try to get the cascading deletes to suck less.

At my current client, we've been running a about 200 packages nightly for the past 10 months and are also at 365 days of history. Our biggest tables, by an order of magnitude are.

Schema    Table                   RowCount internal  event_message_context   1,869,028 internal  operation_messages      1,500,811 internal  event_messages          1,500,803 

The driver of all of that data, internal.operations only has 3300 rows in it, which aligns with Phil's comment about how exponentially this data grows.

So, identify the operation_id to be purged and the delete from the leaf tables working back to the core, internal.operations table.

USE SSISDB; SET NOCOUNT ON; IF object_id('tempdb..#DELETE_CANDIDATES') IS NOT NULL BEGIN     DROP TABLE #DELETE_CANDIDATES; END;  CREATE TABLE #DELETE_CANDIDATES (     operation_id bigint NOT NULL PRIMARY KEY );  DECLARE @DaysRetention int = 100; INSERT INTO     #DELETE_CANDIDATES (     operation_id ) SELECT     IO.operation_id FROM     internal.operations AS IO WHERE     IO.start_time < DATEADD(day, -@DaysRetention, CURRENT_TIMESTAMP);  DELETE T FROM     internal.event_message_context AS T     INNER JOIN         #DELETE_CANDIDATES AS DC         ON DC.operation_id = T.operation_id;  DELETE T FROM     internal.event_messages AS T     INNER JOIN         #DELETE_CANDIDATES AS DC         ON DC.operation_id = T.operation_id;  DELETE T FROM     internal.operation_messages AS T     INNER JOIN         #DELETE_CANDIDATES AS DC         ON DC.operation_id = T.operation_id;  -- etc -- Finally, remove the entry from operations  DELETE T FROM     internal.operations AS T     INNER JOIN         #DELETE_CANDIDATES AS DC         ON DC.operation_id = T.operation_id; 

Usual caveats apply

  • don't trust code from randoms on the internet
  • use the diagrams from ssistalk and/or system tables to identify all the dependencies
  • you might need to only segment your delete operations into smaller operations
  • you might benefit by dropping RI for operations but be certain to re-enable them with the check option so they are trusted.
  • consult your dba if operations last longer than 4 hours

July 2020 edit

Tim Mitchell has a good set of articles on SSIS Catalog Automatic Cleanup and A better way to Clean up the SSIS Catalog Database and his fancy new book The SSIS Catalog: Install, Manage, Secure and Monitor Your Enterprise ETL Infrastructure

@Yong Jun Kim noted in the comments

There is a chance SSIS DB might have different table names with scaleout at the end now. Instead of internal.event_message_context it can be internal.event_message_context_scaleout. Instead of internal.operations_messages, it can be internal.operations_messages_scaleout. Just modify the table names in the code accordingly, and it should run fine

This is certainly the case if you are using an SSIS IR within Azure Data Factory. You will find the "normal" tables still present but empty, with the *_scaleout versions containing all the data.

References

  • Catalog Indexing Recommendations
  • Beware the SSIS Server Maintenance Job
  • Slow performance when you run the SSIS Server Maintenance Job to remove old data in SQL Server 2012
like image 129
billinkc Avatar answered Sep 21 '22 00:09

billinkc


I created a similar stored procedure to do the archiving which is below. Let me know if any bugs. I provide no guarantees, but it works fine for me. This code is by no means polished but i wanted to share (i.e. the use of the temp table means can only run once at a time, maybe a session scoped table would be better)

I had a problem where by MS own clean procedure would blow ouut the LDF file and lock the tables for long periods, and once caused the server to run out of space. I decided to write my own to delete the larger tables first and then delete the operation table. This procedure below never uses more than a 1GB in log space, and does not lock the tables for very long periods which is a catch 22 when SSIS jobs must run all day.

First it logs to a table

CREATE TABLE [dbo].[ETL_SSIS_Operations_Archived](     [id] [int] IDENTITY(1,1) NOT NULL,     [operation_id_str] [varchar](900) NOT NULL,     [event_messages_context] [int] NULL,     [event_messages] [int] NULL,     [operation_messages] [int] NULL,     [num_operators] [int] NULL,     [chunksize] [int] NULL,     [DateStarted] [datetime] NOT NULL,     [DateFinished] [datetime] NULL,     [executionSecs] [int] NULL,     [DelOperationsDateStarted] [datetime] NULL,     [DelOperationsDateFinished] [datetime] NULL,     [DelOperationsExecutionSecs] [int] NULL ) ON [PRIMARY] GO 

and uses a temp table

CREATE TABLE [dbo].[tmp_etl_operations_id](     [operation_id] [int] NULL,     [dateCreated] [datetime] NULL default getdate() ) ON [PRIMARY] GO  ALTER TABLE [dbo].[tmp_etl_operations_id] ADD  DEFAULT (getdate()) FOR [dateCreated] GO 

here is the procedure

    CREATE PROCEDURE [dbo].[sp_Archive_SSIDB_Catalogue] AS BEGIN      DECLARE @MyCursor as CURSOR;     DECLARE @l_operation_id int;     declare @l_rows_del int = 1      declare @l_operation_id_str varchar(8000) = ''     declare @l_id int      declare @l_event_message_context int = 0     declare @l_event_messages        int = 0     declare @l_operation_messages    int = 0      declare @l_loop_num int = 1     declare @C_BULK_NUM int = 100     declare @C_CHUNK_SIZE int = 100000      declare @l_last_rec char(1)      SET @MyCursor = CURSOR FOR        with params as        (            -- i round up the midnight that day, just so i know once it is done for the day it is done            -- and if the real maintenance job was to run after this (just for the sake of it to double ensure nothing has been missed), but not actually need to do            -- anything as its already done in here, no new operations would have snuck in due to the sliding system time            SELECT cast(dateadd(day,1,GETDATE() - CONVERT(int,property_value)) as date)  ArchiveDate             FROM  ssisdb.[catalog].[catalog_properties]            WHERE property_name = 'RETENTION_WINDOW'        )        select operation_id,iif(r=c,'Y','N') lastrec        from        (            select operation_id,row_number() over (partition by null order by operation_id) r,count(*) over (partition by null) c            FROM ssisdb.[internal].[operations]             WHERE ( [end_time] <= (select ArchiveDate from params)            -- A special case when END_TIME is null, we will delete the records based on the created time             OR ([end_time] IS NULL AND [status] = 1 AND [created_time] <= (select ArchiveDate from params) ))        ) x        order by operation_id       OPEN @MyCursor;     FETCH NEXT FROM @MyCursor INTO @l_operation_id,@l_last_rec      WHILE @@FETCH_STATUS = 0     BEGIN         set @l_operation_id_str = @l_operation_id_str+','+cast(@l_operation_id as varchar(100))          if @l_loop_num = 1         begin            delete from tmp_etl_operations_id            set @l_operation_id_str = cast(@l_operation_id as varchar(100))         end          insert into tmp_etl_operations_id (operation_id)  select @l_operation_id          if @l_loop_num = @C_BULK_NUM or @l_last_rec='Y'         begin             set @l_loop_num = 1              set @l_event_message_context = 0             set @l_event_messages        = 0             set @l_operation_messages    = 0              insert into ETL_SSIS_Operations_Archived ([operation_id_str], num_operators,chunksize, event_messages_context, event_messages, operation_messages, datestarted)             select @l_operation_id_str, @C_BULK_NUM,@C_CHUNK_SIZE,@l_event_message_context,@l_event_messages,@l_operation_messages,getdate()             --where 0 = (select count(*) from ETL_SSIS_Operations_Archived where operation_id=@l_operation_id_str)              set @l_id = Scope_Identity()               set @l_rows_del = @C_CHUNK_SIZE             while (@l_rows_del >= @C_CHUNK_SIZE)             begin              delete top (@C_CHUNK_SIZE)              from   ssisdb.internal.event_message_context              where  operation_id in (select operation_id from etl..tmp_etl_operations_id)               set @l_rows_del = @@ROWCOUNT              set @l_event_message_context = @l_event_message_context+@l_rows_del               update ETL_SSIS_Operations_Archived               set    event_messages_context = event_messages_context+@l_rows_del              where  id = @l_id--operation_id = @l_operation_id_str              end              set @l_rows_del = @C_CHUNK_SIZE              while (@l_rows_del >= @C_CHUNK_SIZE)             begin              delete top (@C_CHUNK_SIZE)              from   ssisdb.internal.event_messages              where  operation_id in (select operation_id from tmp_etl_operations_id)               set @l_rows_del = @@ROWCOUNT              set @l_event_messages = @l_event_messages+@l_rows_del               update ETL_SSIS_Operations_Archived               set    event_messages = event_messages+@l_rows_del              where  id = @l_id--operation_id = @l_operation_id_strwhere  operation_id = @l_operation_id_str               end              set @l_rows_del = @C_CHUNK_SIZE             while (@l_rows_del >= @C_CHUNK_SIZE)             begin              delete top (@C_CHUNK_SIZE)              from   ssisdb.internal.operation_messages              where  operation_id in (select operation_id from tmp_etl_operations_id)               set @l_rows_del = @@ROWCOUNT              set @l_operation_messages = @l_operation_messages+@l_rows_del               update ETL_SSIS_Operations_Archived               set    operation_messages = operation_messages+@l_rows_del              where  id = @l_id--operation_id = @l_operation_id_strwhere  operation_id = @l_operation_id_str --               end               update ETL_SSIS_Operations_Archived               set    DateFinished = getdate()                    ,executionSecs =  Datediff(s, DateStarted, getdate())                    ,DelOperationsDateStarted = getdate()              where  id = @l_id--operation_id = @l_operation_id_strwhere  operation_id = @l_operation_id_str --                 -- lets delete the operations now              delete --top (@C_CHUNK_SIZE)              from   ssisdb.internal.operations              where  operation_id in (select operation_id from tmp_etl_operations_id)               update ETL_SSIS_Operations_Archived               set    DelOperationsDateFinished = getdate()                    ,DelOperationsExecutionSecs =  Datediff(s, DelOperationsDateStarted, getdate())              where  id = @l_id--operation_id = @l_operation_id_strwhere  operation_id = @l_operation_id_str --           end         else         begin             set @l_loop_num = @l_loop_num+1         end          FETCH NEXT FROM @MyCursor INTO @l_operation_id,@l_last_rec       END      CLOSE @MyCursor;     DEALLOCATE @MyCursor;  END 
like image 24
Ab Bennett Avatar answered Sep 22 '22 00:09

Ab Bennett