Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Source control in SSIS and Concurrent work on dtsx file

I am working on building a new SSIS project from scratch. I want to work with couple of my teammates. I was hoping to get a suggestion on how we can have some have some source control, so that few of us can work concurrently on the same SSIS project (same dtsx file, building new packages.) Version: SQL Server Integration Service v11 Microsoft Visual Studio 2010

like image 423
imba22 Avatar asked May 11 '16 19:05

imba22


People also ask

What is source control in SSIS?

Each person can work on seperate packages, and "check in" any changes and merge changes with other developers when needed. If you set up a SSIS project in Team foundation server under source control each one of your developers simply download this checkin in changes to achive a "central" build.

How do I debug a Dtsx file?

Steps: Go to the Control Flow and right click on the Data Flow Task (or any task where you want create a breakpoint) and select Edit Breakpoints. Now in the new Set Breakpoints window Select Break when the container receives the OnPostExecute event. Click OK.

What is the use of Dtsx files?

Specifies the Data Transformation Services Package File Format (DTSX), which is an XML-based file format that stores the instructions for the processing of a data flow from its points of origin to its points of destination, including transformations and optional processing steps between the origin and destination ...


2 Answers

It is my experience that there are two opportunities for any source control system and SSIS projects to get out of whack: adding new items to the project and concurrent changes to an existing package.

Adding new items

An SSIS project has the .dtproj extension. Inside there, it's "just" XML defining what all belongs to the project. At least for 2005/2008 and 2012+ on the package deployment model. The 2012+ project deployment model carries a good bit more information about the state of the packages in the project.

When you add new packages (or project level connection managers or .biml files) the internal structure of the .dtproj file is going to change. Diff tools generally don't handle merging XML well. Or at all really. So, to prevent the need for merging the project definition, you need to find a strategy that works for you team.

I've seen two approaches work well. The first is to upfront define all the packages you think you'll need. DimFoo, DimDate, DimFoo, DimBar, FactBlee. Check that project and the associated empty packages in and everyone works on what is out there. When the initial cut of packages is complete, then you'll ensure everyone is sync'ed up and then add more empty packages to the project. The idea here is that there is one person, usually the lead, who is responsible for changing the "master" project definition and everyone consumes from their change.

The other approach requires communication between team members. If you discover a package needs to be added, communicate with your mates "I need to add a new package - has anyone modified the project?" The answer should be No. Once you've notified that a change to the project definition is coming, make it and immediately commit it. The idea here is that people commit and sync/check in whatever terminology with great frequency. If you as a developer don't keep your local repository up to date, you're going to be in for a bad time.

Concurrent edits

Don't. Really, that's about it. The general problem with concurrent changes to an SSIS package is that in addition to the XML diff issue above, SSIS also includes layout data alongside tasks so I can invert the layout and make things flow from bottom to top or right to left and there's no material change to SSIS package but as Siyual notes "Merging changes in SSIS is nightmare fuel"

If you find your packages are so large and that developers need to make concurrent edits, I would propose that you are doing too much in there. Decompose your packages into smaller, more tightly focused units of work and then control their execution through a parent package. That would allow a better level of granularity to your development and debugging process in addition to avoiding the concurrent edit issue.

like image 102
billinkc Avatar answered Sep 20 '22 16:09

billinkc


A dtsx file is basically just an xml file. Compare it to a bunch of people trying to write the same book. The solution I suggest is to use Team Foundation Server as a source control. That way everyone can check in and out and merge packages. If you really dont have that option try to split your ETL process in logical parts and at the end create a master package that calls each sub packages in the right order.

An example: Let's say you need to import stock data from one source, branches and other company information from an internal server and sale amounts from different external sources. After u have all information gathered, you want to connect those and run some analyses.

You first design the target database entities that you need and the relations. One of your member creates a package that does all the import to staging tables. Another guy maybe handles external sources and parallelizes / optimizes the loading. You would build a package that in merges your staging and production tables, maybe historicizing and so on. At the end you have a master package that calls each of the mentioned packages and maybe some additional logging or such.

like image 31
Raul Avatar answered Sep 20 '22 16:09

Raul