I am part of a development team currently working with a database that does not have any kind of source control. We work with SQL Server 2008 R2 and have always managed the DB directly with SSMS. It now has ~340 tables and ~1600 stored procedures, plus a few triggers and views, so it is not a small DB.
My goal is to have the DB under version control, so I have been reading articles, like Scott Allen's series and many old SO related questions. But I am still unable to decide on how to proceed.
What I'm thinking of is to script the database schema in one file, then procedures, triggers and views in one file each. Then keep everything versioned under Mercurial. But of course, every member of the team can access SSMS and directly change the schema and procedures, with the possibility that any of us can forget to replicate those changes in the versioned files.
What better options are there? And, did I forget any element worth having source control of? My biggest concern is that most of the literature I found explains how to do version control when creating a new database, but not when it is already old and relatively big.
The General Process
We create a baseline for a particular version (say, v1.0
). A baseline includes one complete schema creation script, as well an upgrade script from allowed previous versions, if any (more on that in a moment). So for v1.0
, we'd have just one script:
baseline-v1.0.sql
From that baseline, we create incremental change scripts as we work from the previous baseline. These scripts are created in a way that they are reentrant, so that they can be run safely multiple times (where the first time only does any actual work; see the next paragraph on a suggestion how). We just create a file for each change script with the baseline name and a timestamp (which we call the version). So for example, say we create two change scripts after a baseline. We'd have the following files:
baseline-v1.0.sql (for creating new installations)
baseline-v1.0-201211071220.sql (created on Nov. 7, 2012 at 12:20 PM UTC)
baseline-v1.0-201211122019.sql (created on Nov. 12, 2012 at 8:00 PM UTC)
We create a schema_version
table that has two columns: baseline
and version
. baseline
is some label (such as v1.0
mentioned above), and version
is just a timestamp of when the change script was created (we chose to do this because creating arbitrary version numbers created annoying administrative overhead, where a timestamp was easy to use). So before running the change script, we check to see if the change script has been applied yet, by querying for it by baseline
and version
. If it's already present, just return out of the script or whatever. Otherwise, apply the change and insert into the schema_version
table to mark the change script completed.
Example change script:
-- Created by <developer> on Nov. 7, 2012 at 12:20 PM UTC
declare @schema_baseline varchar(10), @schema_version varchar(12)
set @schema_baseline = 'v1.0'
set @schema_version = '201211071210'
if exists (select 1 from schema_version where baseline = @schema_baseline and version = @schema_version = @schema_version) return 0
-- begin change script
-- place your schema changes here
-- end change script
insert into schema_version(@schema_baseline, @schema_version)
Now, when we actually install the software, we run the relevant baseline
script. As we upgrade that version, we just apply the change scripts in order.
When we hit a significant milestone in our product development phase, we create a new baseline. So, we create a new baseline script (again, this is a snapshot of the DB as a baseline), plus an upgrade script from the previous baseline. So let's say we have a new baseline, v2.0
, we'd have the following files:
baseline-v2.0.sql (for creating new installations)
baseline-v2.0-upgrade-v1.0.sql (for upgrading from v1.0)
Then the process continues.
How We Apply Changes
The scripts are all kept in source control. We do have a tool that packages these files and automatically upgrades databases, which our support and installation teams use. The tool figures out the current baseline of the target database, and asks the user if they wish to upgrade to the baseline in the package. If they do, and there is a valid upgrade path from the current version, it applies the upgrade script, and updates the schema_version.baseline
, and deletes all entries for change scripts from the previous baseline. If the database is new, it applies the regular baseline script. Either way, after the baseline is achieved, it applies all change scripts from the baseline that are present in the package, one at a time, in order, in a transaction. If a particular change script fails, it rolls back the last set of changes and errors out. We look at the log, fix any issues, then rerun the package again. At that point, it should just pick up at the last change script that succeeded, saving time.
Automation and Diff Tools
We do not allow diff tools to upgrade production databases directly. It's just too risky. We do use diff tools, of course, to help create our upgrade and change scripts, but once we have them, we comb through them, massage them, test them, etc., then create the upgrade or change script according to the specs above. We do use tools/shell scripts to create the change script files and put the boiler plate schema_version
checking.
Caveats
It's actually pretty straight-forward and it works well. The only time it really gets tricky is with branches. For the most part, branches are handled well. If we need a change script for a particular branch's work, it will fold into the mainline very well once we merge the branch back in. No problem. Where it gets tricky is when two branches try to do similar things, or where one branch relies on another. That's mostly a process and planning issue, though. If we get stuck in such a situation, we just create a new baseline (say v2.1
), then update the branches accordingly.
Another thing to keep in mind is if an installation wants to be upgraded from one baseline to another, it has to apply all outstanding changes for the current baseline, before we upgrade to the new one. In other words, we don't let installations jump right from where ever they are to the next baseline (unless, of course, they're already at the most recent version for the current baseline).
I would recommend SQL Server Data Tools and/or a Visual Studio SQL database project. It will reverse engineer your existing DB to code(sql) files that can be version controlled and gives many other niceties (publishing, comparison, etc)
We developed SQL Source Control specifically to solve the problem you describe. It extends SSMS to provide a link between your SQL Server schema objects (and static data) and your existing source control system.
http://www.red-gate.com/products/sql-development/sql-source-control/
If you need any more information, we'd be very pleased to help (contact [email protected])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With