Database migrations in a complex branching system

Tags:

In our current development workflow we have introduced database migrations (using Ruckusing) to keep our developers' db schema's in sync. It works great, is pretty straightforward in use but now we have switched to git as VCS we are facing the next problem in our database versioning system.

When checking out a branch that has been in development for some time it might happen that the database schema has diverged alot from the schema in the branch i'm coming from. This causes database conflicts in some cases. Logically it seems that we need to run migrations depending on the branch we were on previously but that can get complex really fast and will run into problems with some people for sure. And as far as i know there isn't a db migration system that is branch-aware??

Added complexity comes when switching to a feature branch we might need to run some migrations up while other down ... technically this seems impossible using our current dbmigration scripts, are there any sane alternatives? Are there any preferred ways of working with database migrations in a very active and branched development system?

503

asked Jun 20 '11 09:06

ChrisR

2 Answers

I think the whole idea of incremental migrations is pretty rotten, really. In a complex environment like yours, it really doesn't work. You could make it work for simple branch patterns, but for anything complicated, it will be a nightmare.

The system i'm working with now takes a different approach: we have no ability to make incremental migrations, but only to rebuild the database from a baseline. During initial development, that baseline was an empty database, and during maintenance, it's a copy of the live database (restored from a dump). We just have a pile of SQL and XML scripts that we apply to the baseline to get a current system (migrations, essentially, but not designed to be run incrementally). Updating or switching branches is then very simple: nuke the database, load a dump to establish the baseline, run the scripts.

This process is not as quick as just running a few migrations, but it's quick enough. It takes long enough that you can go and get a cup of coffee, but not long enough to get lunch.

The huge advantage is that starting by nuking the database means the process is completely history-independent, so it doesn't need to know or care about crossing branches, going back in time, or anything else.

When you take a release live, you obviously do things slightly differently: you don't nuke the database or load a dump, because the system is already at the baseline (the baseline is defined as the state of the live system!). You just run the scripts. And after that, make a fresh dump to be used as a new baseline for development.

answered Sep 21 '22 00:09

Tom Anderson

I wouldn't really agree with incremental migrations being rotten. Having a set of homegrown scripts would, in my opinion, be a worse approach than having a real tool for such a job will make tracking those changes easier. I've had to deal with a similar situation myself, before, so hopefully I can share some of the insights.

To my experience RDBMS-schemas and branches don't mix very well. Depending on your branching the schemas should probably be at least somewhat similar, in which case the migrations should not differ too much. Or I might just have misunderstand the full extent of the problem. If you're e.g. trying to keep customer specific code on a branch, then maybe you should consider a way to modularize it instead. We did something like this, having rules that stated that customer specific schema changes, and code could only ever depend on the common code base, not the other way around. We also set the precedence between module changesets based on module and date, so we for most parts knew the order in which the changes were to be applied. YMMV, of course, but it's hard to give specifics, without knowing your current setup.

At my old company we successfully used a tool called Liquibase, which sounds similar to what you're using. Basically it is a tool for taking a DB schema, and all the data from one known state to another known state. The same changeset is applied only once, since liquibase maintains a changelog, with checksums. The changelogs are written in a specific XML format. I can strongly recommend to try it out, if you need alternatives.

Anyway, the way we handled customer code and branches, was to have a specific DB/schema for a given branch. This way you could have the schema and data from the branching point, and only migrate the diff to the current situation. We did not undo changes, even if liquibase in theory could support this, as we felt it was way too cumbersome and error prone. Given that liquibase keeps it's own state, the migration was always as easy as taking the current state on a given branch, and apply all. Only new changesets were applied, leaving the schema in a good state.

We used mercurial, which is distributed, like git, so the setup was quite similar. We also had developer specific local DBs on the dev laptops, and a number of environments, both for different customers and phases (development, integration, production), so the model was put to a real test, and it worked surprisingly well. We had some conflicts in the changesets, but we were mostly able to resolve those soon after the problem was introduced. The local development envs were really the hardest part, since during development some schema changes might have been introduced, which were not always compatible with later changesets, but the structured nature of the changes, and having a known state to revert to lead to very few real problems.

There are a few caveats with this approach:

All and any changes to the schema must be implemented in the changesets. The biggest cause of confusion was always someone just fiddling around a bit.
The first point also applies, even if you're using a tool that modifies the schema, e.g. a ORM-tool like Hibernate. You need to be pretty intimate with this tool to understand the changes it makes and requires.
All users must buy into this, and be educated to follow the rules. Check 1.
There comes a point when migrating lots of changesets starts taking too much time. At this time you will need to create a new baseline, which can be a bit tricky, especially with a lot of branches. It's good to plan ahead for this as well, and at least know of all existing DB-branches.
You need to plan ahead a bit with the branches, to know whether they're going to migrate back to master at some point. Naive merging might not work well for schema changes.
For very long lived branches and separated datasets this model might not be strong enough

The point is, however, that the more structure and control you have over the database, the easier migrations will be. Therefore tools like Liquibase could be a really valuable asset to help you track those changes. This applies to more complex models even to a greater extent, than to simple ones, so please at least don't consider dumping all tools you already have in place. And take some time to explore other alternative tools.

Some structure and control is better than none, or even worse, thinking you are in control with a big bunch of manual scripts.

answered Sep 23 '22 00:09

Kai Inkinen

Related questions
                            
                                git line endings - can't stash, reset and now can't rebase over spurious line endings commit
                            
                                How can you combine git add patch -p mode with diff's ignore-all-space
                            
                                How can I prevent foxtrot merges in my 'master' branch?
                            
                                GitLab CI Pipeline on specific branch only
                            
                                Git gc using excessive memory, unable to complete
                            
                                Git: Ignore files for public repository, but not for private
                            
                                github server-side git hooks (i.e. pre-commit and post-commit)
                            
                                Git Subtree only one file or directory
                            
                                Deleted remote branches still listed in TortoiseGit
                            
                                How to add a single file to GIT LFS?
                            
                                GitHub pages only showing ReadMe file?
                            
                                How to prefer files from one branch during a merge?
                            
                                Escape comment character (#) in git commit message [duplicate]
                            
                                git svn - Can I use git and svn at the same time? no need interaction between git and svn
                            
                                TortoiseGit - git not found
                            
                                Gerrit with Github
                            
                                Disable push to specific branches on GitHub
                            
                                In a Git cherry-pick or rebase merge conflict, how are BASE (aka "the ancestor"), LOCAL, and REMOTE determined?
                            
                                git pull and resolve conflicts
                            
                                Find Git branch name in post-update hook [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Database migrations in a complex branching system

Tags:

git

git-branch

database-migration

database-versioning

ChrisR

People also ask

2 Answers

Tom Anderson

Kai Inkinen

Recent Activity

Donate For Us