Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Database migrations in a complex branching system

In our current development workflow we have introduced database migrations (using Ruckusing) to keep our developers' db schema's in sync. It works great, is pretty straightforward in use but now we have switched to git as VCS we are facing the next problem in our database versioning system.

When checking out a branch that has been in development for some time it might happen that the database schema has diverged alot from the schema in the branch i'm coming from. This causes database conflicts in some cases. Logically it seems that we need to run migrations depending on the branch we were on previously but that can get complex really fast and will run into problems with some people for sure. And as far as i know there isn't a db migration system that is branch-aware??

Added complexity comes when switching to a feature branch we might need to run some migrations up while other down ... technically this seems impossible using our current dbmigration scripts, are there any sane alternatives? Are there any preferred ways of working with database migrations in a very active and branched development system?

like image 503
ChrisR Avatar asked Jun 20 '11 09:06

ChrisR


People also ask

What is database migration?

Database migration is the process of migrating data from one or more source databases to one or more target databases by using a database migration service. When a migration is finished, the dataset in the source databases resides fully, though possibly restructured, in the target databases.

When should you run database migrations?

Run the database migrations first, before you deploy the new code. This means the before code must work with both database schemas, but the after code can assume that the tables have already been added.

Why are database migrations needed?

Database migration is essential because it helps to save money. The benefit of database migration is that it helps move data from an outdated legacy system to a modernized software. Database migration helps unify disparate data, so it is accessible by different techniques.


2 Answers

I think the whole idea of incremental migrations is pretty rotten, really. In a complex environment like yours, it really doesn't work. You could make it work for simple branch patterns, but for anything complicated, it will be a nightmare.

The system i'm working with now takes a different approach: we have no ability to make incremental migrations, but only to rebuild the database from a baseline. During initial development, that baseline was an empty database, and during maintenance, it's a copy of the live database (restored from a dump). We just have a pile of SQL and XML scripts that we apply to the baseline to get a current system (migrations, essentially, but not designed to be run incrementally). Updating or switching branches is then very simple: nuke the database, load a dump to establish the baseline, run the scripts.

This process is not as quick as just running a few migrations, but it's quick enough. It takes long enough that you can go and get a cup of coffee, but not long enough to get lunch.

The huge advantage is that starting by nuking the database means the process is completely history-independent, so it doesn't need to know or care about crossing branches, going back in time, or anything else.

When you take a release live, you obviously do things slightly differently: you don't nuke the database or load a dump, because the system is already at the baseline (the baseline is defined as the state of the live system!). You just run the scripts. And after that, make a fresh dump to be used as a new baseline for development.

like image 42
Tom Anderson Avatar answered Sep 21 '22 00:09

Tom Anderson


I wouldn't really agree with incremental migrations being rotten. Having a set of homegrown scripts would, in my opinion, be a worse approach than having a real tool for such a job will make tracking those changes easier. I've had to deal with a similar situation myself, before, so hopefully I can share some of the insights.

To my experience RDBMS-schemas and branches don't mix very well. Depending on your branching the schemas should probably be at least somewhat similar, in which case the migrations should not differ too much. Or I might just have misunderstand the full extent of the problem. If you're e.g. trying to keep customer specific code on a branch, then maybe you should consider a way to modularize it instead. We did something like this, having rules that stated that customer specific schema changes, and code could only ever depend on the common code base, not the other way around. We also set the precedence between module changesets based on module and date, so we for most parts knew the order in which the changes were to be applied. YMMV, of course, but it's hard to give specifics, without knowing your current setup.

At my old company we successfully used a tool called Liquibase, which sounds similar to what you're using. Basically it is a tool for taking a DB schema, and all the data from one known state to another known state. The same changeset is applied only once, since liquibase maintains a changelog, with checksums. The changelogs are written in a specific XML format. I can strongly recommend to try it out, if you need alternatives.

Anyway, the way we handled customer code and branches, was to have a specific DB/schema for a given branch. This way you could have the schema and data from the branching point, and only migrate the diff to the current situation. We did not undo changes, even if liquibase in theory could support this, as we felt it was way too cumbersome and error prone. Given that liquibase keeps it's own state, the migration was always as easy as taking the current state on a given branch, and apply all. Only new changesets were applied, leaving the schema in a good state.

We used mercurial, which is distributed, like git, so the setup was quite similar. We also had developer specific local DBs on the dev laptops, and a number of environments, both for different customers and phases (development, integration, production), so the model was put to a real test, and it worked surprisingly well. We had some conflicts in the changesets, but we were mostly able to resolve those soon after the problem was introduced. The local development envs were really the hardest part, since during development some schema changes might have been introduced, which were not always compatible with later changesets, but the structured nature of the changes, and having a known state to revert to lead to very few real problems.

There are a few caveats with this approach:

  1. All and any changes to the schema must be implemented in the changesets. The biggest cause of confusion was always someone just fiddling around a bit.
  2. The first point also applies, even if you're using a tool that modifies the schema, e.g. a ORM-tool like Hibernate. You need to be pretty intimate with this tool to understand the changes it makes and requires.
  3. All users must buy into this, and be educated to follow the rules. Check 1.
  4. There comes a point when migrating lots of changesets starts taking too much time. At this time you will need to create a new baseline, which can be a bit tricky, especially with a lot of branches. It's good to plan ahead for this as well, and at least know of all existing DB-branches.
  5. You need to plan ahead a bit with the branches, to know whether they're going to migrate back to master at some point. Naive merging might not work well for schema changes.
  6. For very long lived branches and separated datasets this model might not be strong enough

The point is, however, that the more structure and control you have over the database, the easier migrations will be. Therefore tools like Liquibase could be a really valuable asset to help you track those changes. This applies to more complex models even to a greater extent, than to simple ones, so please at least don't consider dumping all tools you already have in place. And take some time to explore other alternative tools.

Some structure and control is better than none, or even worse, thinking you are in control with a big bunch of manual scripts.

like image 83
Kai Inkinen Avatar answered Sep 23 '22 00:09

Kai Inkinen