Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Databases and "branch"

We are currently developping an application which use a database.

Every time we update the database structure, we have to provide a script to update the database from the previous version to the current one.

So the database has currently a number that gave us it's current version and then our software make an update when we want to use an "old" database.

The issue we are encountering is when we have branches:

When we create a new big feature, that will not be available for users(and not included in releases), we create a branch.

The main branch(trunk) will be merged regularly to ensure that the create brunch has the latest bug corrections.

Here is some illustration: enter image description here

The issue is with our update scripts. They update from the previous version to the current one, then update the version number of the database.

Imagine that we have the DB version 17 when creating the branch.

We then do the branch, and make changes on the Trunk DB. The DB has now the version 18.

Then we make a db change on the branch. Since we know there has already been a new version "18", we create the version 19 and the updater 18->19.

Then the trunk is merged on the branch.

At this very moment we may have some updaters that will never runs.

If someone updated his database before the merge, his database will be flagged has having the version 19, the the update 17->18 will never be done.

We want to change this behavior but we can't find how:

Our constraints are:

  • We are unable to make all changes on the same branch
  • Sometimes we have more than just 2 branchs, and we can only merge from the trunk to the feature branch until the feature is finished

What can we do to ensure a continuity between our database branch?

like image 975
J4N Avatar asked Apr 16 '14 11:04

J4N


People also ask

What is data branch?

The purpose of the Data Branch is to manage and support databases and software applications used in the Operations Section and to receive, manage, and analyze information about the infectious disease emergency that can guide the selection of strategies to contain the event.

What is a database call?

AMB Database (DB) calls are predefined, easy-to-use statements with common syntax that allows transparent access to a variety of databases. The AMB DB calls let you focus on what needs to be accomplished, rather than the mechanics of the target environment.

What is SQL branching?

To branch with SQL Source Control, you must create the branch using your source control system and then link to the appropriate branch. Once you've created a branch, you're ready to work on it in SQL Server Management Studio with SQL Source Control.


2 Answers

I think the easiest way is to use the Ruby-on-rails approach. Every DB change is a separate script file, no matter how small. Each script file is numbered, and when you do an upgrade you simply run each script from the number your DB currently is to the last one.

What this means in practice is that your DB version system stops being v18 to v19, and starts being v18.0 to v18.01, then v18.02 etc. What you release to the customer may get rolled up into a big v19 upgrade script, but as you develop, you will be making many, many small upgrades.

You'll have to modify this slightly to work for your system, each script will either have to be renumbered as it gets merged to the branch or you will have to ensure the upgrade scripts don't simply track the last upgrade number, but track each upgrade number so missing holes will still get filled in as the script gets merged across.

You will also have to roll up these little upgrades into the next major number as you create the release tag (on the trunk first) to keep things sane.


edit: so fundamentally you first havew to get rid of the notion of using a upgrade sdcript to go from version to version. For example, if you start with a table, and trunk adds column A and the branch adds column B, then you merge trunk to branch - you cannot realistically "upgrade" to the version with both, unless the branch version number is always greater than the trunk's upgrade script, and that doesn't work if you subsequently merge trunk to the branch. So you must therefore scrap the idea of a "version" that applies to development branches. The only way round that is to update each change independently, and track each change individually. Then you can say you need the "last main release plus colA plus colB" (admittedly if you merge trunk in, you can take the current main release from trunk whether its v18 or v19, but you still need to apply each branch update individually).

So you start with trunk at DB v18. Branch and make changes. Then you merge trunk later, where the DB is at v19. Your earlier branch changes still need to be applied (or should already be applied, but you may need to write a branch-update script with all branch changes in it, if you re-create your DB). Note the branch does not have a "v20" version number at all, and the branches changes are not made to a single update script like you have on trunk. You can add these changes you make on branch as a single script if you like (or 1 script of 'since the last trunk merge' changes) or as many little scripts. When the branch is complete, the very last task is to take all the DB changes made for the branch and toll them up into a script that can be applied to the master upgrader, and when it is merged onto trunk, that script is merged into the current upgrade script and the DB version number bumped.

There is an alternative that may work for you, but I found it to be a little flaky when you try to update DBs with data, sometimes it just couldn't manage to do the update and the DB had to be wiped and re-created (which, to be fair, is probably what would have had to happen if I used SQL scripts at the time). That's to use Visual Studio Database project. This stores every part of the schema as a file, so you'll have 1 script per table. These will be hidden from you by Visual Studio itself that will show you designers instead of scripts but they're stored as files in version control. VS can deploy the project and will try to upgrade your DB if it already exists. Be careful of the options, many defaults say "drop and create" instead of using alter to update an existing table.

These projects can generate a (largely machine-readable) SQL script for deployment, we used to generate these and deliver them to a DBA team who didn't use VS and only accepted SQL.

And lastly, there's Roundhouse which is not something I've used but it might help you to become the new upgrader "script". Its a free project and I've read its more powerful and easier to use than VS DB projects. Its a DB versioning and change management tool, integrates with VS, and uses SQL scripts.

like image 115
gbjbaanb Avatar answered Sep 21 '22 15:09

gbjbaanb


We use the following procedure for about 1.5 years now. I don't know if this is the best solution, but we didn't have any trouble with it (except some human errors in a delta-file like forgetting a USE-statement).

It has some simularities with the answer that Krumia gave, but differs in the point that in this approach only new change scripts/delta files are executed. This makes it a lot easier to write those files.

Delta files

Write all the DB-changes you make for a feature in a delta-file. You can have multiple statements in one delta-file or split them up into multiple. Once committed that file it's best (and once merged it's necessary) to start a new one and leave the old one untouched.

Put all the delta-files in one directory and give them a name-pattern like YYYY-MM-DD-HH.mm.description.sql. It's essential that you can sort them in time (therefore the timestamp) so you know what file needs to be executed first. Besides that you don't want to have a merge conflict with those files so it should be unique (over all branches).

Merging/pulling

Create a merge-script (for examlpe a bash-script) that performs the following actions:

  1. Note the current commit-hash
  2. Do the actual merge (or pull)
  3. Get a list of all the delta-files that are added with this merge (git diff --stat $old_hash..HEAD -- path/to/delta-files)
  4. Execute those delta-files, in the order specified by the timestamp

By using git to determine what files are new (and thus what database-actions aren't executed yet on the current branch) you are not longer bound to version-numbering.

Alternating delta-files

It might happen that within one merge delta-files from different branches may be 'new to execute' and that those files alternate like this:

  1. 2014-08-04-delta-from-feature_A.sql
  2. 2014-08-05-delta-from-feature_B.sql
  3. 2014-08-06-delta-from-feature_A.sql

As the timestamp determines the execution-order there will be first added something from feature A, then feature B, then back again to feature A. When you write proper delta-files, that are executable by themself/stand-alone, that shouldn't be a problem.

like image 26
Peter van der Wal Avatar answered Sep 21 '22 15:09

Peter van der Wal