Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL version control methodology

There are several questions on SO about version control for SQL and lots of resources on the web, but I can't find something that quite covers what I'm trying to do.

First off, I'm talking about a methodology here. I'm familiar with the various source control applications out there and I'm familiar with tools like Red Gate's SQL Compare, etc. and I know how to write an application to check things in and out of my source control system automatically. If there is a tool which would be particularly helpful in providing a whole new methodology or which have a useful and uncommon functionality then great, but for the tasks mentioned above I'm already set.

The requirements that I'm trying to meet are:

  • The database schema and look-up table data are versioned
  • DML scripts for data fixes to larger tables are versioned
  • A server can be promoted from version N to version N + X where X may not always be 1
  • Code isn't duplicated within the version control system - for example, if I add a column to a table I don't want to have to make sure that the change is in both a create script and an alter script
  • The system needs to support multiple clients who are at various versions for the application (trying to get them all up to within 1 or 2 releases, but not there yet)

Some organizations keep incremental change scripts in their version control and to get from version N to N + 3 you would have to run scripts for N->N+1 then N+1->N+2 then N+2->N+3. Some of these scripts can be repetitive (for example, a column is added but then later it is altered to change the data type). We're trying to avoid that repetitiveness since some of the client DBs can be very large, so these changes might take longer than necessary.

Some organizations will simply keep a full database build script at each version level then use a tool like SQL Compare to bring a database up to one of those versions. The problem here is that intermixing DML scripts can be a problem. Imagine a scenario where I add a column, use a DML script to fill said column, then in a later version that column name is changed.

Perhaps there is some hybrid solution? Maybe I'm just asking for too much? Any ideas or suggestions would be greatly appreciated though.

If the moderators think that this would be more appropriate as a community wiki, please let me know.

Thanks!

like image 594
Tom H Avatar asked May 05 '10 15:05

Tom H


1 Answers

I struggled with this for several years before recently adopting a strategy that seems to work pretty well. Key points I live by:

  • The database doesn't need to be independently versioned from the app
  • All database update scripts should be idempotent

As a result, I no longer create any kind of version tables. I simply add changes to a numbered sequence of .sql files that can be applied at any given time without corrupting the database. If it makes things easier, I'll write a simple installer screen for the app to allow administrators to run these scripts whenever they like.

Of course, this method does impose a few requirements on the database design:

  • All schema changes are done through script - no GUI work.
  • Extra care must be taken to ensure all keys, constraints, etc.. are named so they can be referenced by a later update script, if necessary.
  • All update scripts should check for existing conditions.

Examples from a recent project:

001.sql:

if object_id(N'dbo.Registrations') is null 
begin
    create table dbo.Registrations
    (
        [Id]                    uniqueidentifier not null,
        [SourceA]               nvarchar(50)     null,
        [SourceB]               nvarchar(50)     null,
        [Title]                 nvarchar(50)     not null,
        [Occupation]            nvarchar(50)     not null,
        [EmailAddress]          nvarchar(100)    not null,
        [FirstName]             nvarchar(50)     not null,
        [LastName]              nvarchar(50)     not null,
        [ClinicName]            nvarchar(200)    not null,
        [ClinicAddress]         nvarchar(50)     not null,
        [ClinicCity]            nvarchar(50)     not null,
        [ClinicState]           nchar(2)         not null,
        [ClinicPostal]          nvarchar(10)     not null,
        [ClinicPhoneNumber]     nvarchar(10)     not null,
        [ClinicPhoneExtension]  nvarchar(10)     not null,
        [ClinicFaxNumber]       nvarchar(10)     not null,
        [NumberOfVets]          int              not null,  
        [IpAddress]             nvarchar(20)     not null,
        [MailOptIn]             bit              not null,
        [EmailOptIn]            bit              not null,
        [Created]               datetime         not null,
        [Modified]              datetime         not null,
        [Deleted]               datetime         null
    );
end

if not exists(select 1 from information_schema.table_constraints where constraint_name = 'pk_registrations')
    alter table dbo.Registrations add
        constraint pk_registrations primary key nonclustered (Id);

if not exists (select 1 from sysindexes where [name] = 'ix_registrations_created')
    create clustered index ix_registrations_created
        on dbo.Registrations(Created);

if not exists (select 1 from sysindexes where [name] = 'ix_registrations_email')
    create index ix_registrations_email
        on dbo.Registrations(EmailAddress);

if not exists (select 1 from sysindexes where [name] = 'ix_registrations_email')
    create index ix_registrations_name_and_clinic
        on dbo.Registrations (FirstName,
                              LastName,
                              ClinicName);

002.sql

/**********************************************************************
  The original schema allowed null for these columns, but we don't want
  that, so update existing nulls and change the columns to disallow 
  null values
 *********************************************************************/

update dbo.Registrations set SourceA = '' where SourceA is null;
update dbo.Registrations set SourceB = '' where SourceB is null;
alter table dbo.Registrations alter column SourceA nvarchar(50) not null;
alter table dbo.Registrations alter column SourceB nvarchar(50) not null;

/**********************************************************************
  The client wanted to modify the signup form to include a fax opt-in
 *********************************************************************/

if not exists 
(
    select 1 
      from information_schema.columns
     where table_schema = 'dbo'
       and table_name   = 'Registrations'
       and column_name  = 'FaxOptIn'
)
alter table dbo.Registrations 
    add FaxOptIn bit null 
        constraint df_registrations_faxoptin default 0;

003.sql, 004.sql, etc...

At any given time I can run the entire series of scripts against the database in any state and know that things will be immediately brought up to speed with the current version of the app. Because everything is scripted, it's much easier to build a simple installer to do this, and it's adding the schema changes to source control is no problem at all.

like image 64
Chris Avatar answered Nov 02 '22 00:11

Chris