Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

On Heroku, is there danger in a Django syncdb / South migrate after the instance has already restarted with changed model code?

On Heroku, as soon as you push new code, the web-serving instances restart... even if the underlying database schema additions/changes (via syncdb or south migrate) haven't yet been applied.

In many cases, this might just cause harmless errors undtil the syncdb/migrate is run soon afterward. But I'm concerned that in some cases, new code might half-work making unexpected changes in the pre-migration database.

What's the right way to be safe against this risk?

One technique might be to add the syncdb/migrate to the Procfile so it's run before web restart. But, in the case of multiple instances, or maybe even a case where the one old-code-instance is left running until the moment the one new-code-instance is known-up, there's still a variant of the issue where code is talking to a DB with a mismatched schema.

Is there a 'hold all web instances' feature (or common best practice) for letting the migrate complete without web traffic?

Or am I being overly concerned about a risk that is negligible in practice?

like image 990
gojomo Avatar asked Jan 24 '12 20:01

gojomo


3 Answers

The safest way to handle migrations of this nature, Heroku or no, is to strictly adopt a compatibility approach with your schema and code:

  • Every additive or transformative schema change must be backwards-compatible;
  • Every destructive schema change must be performed after the code that depends on it has been removed;
  • Every code change must either be:
    • durable against the possibility that associated schema changes have not yet been made (for instance, removing a model or a field on a model) or
    • made only after the associated schema change has been performed (adding a model or a field on a model)

If you need to make a significant transformation of a model, this approach might require the following steps:

  • Create a new database table to hold your new model structure, and deploy that migration
  • Create a new model with the new structure, and code to copy changes from the old model to the new model when the old model changes, and deploy that code
  • Execute a migration or code action to copy all old model data to the new model
  • Update your codebase to use the new model rather than the old model, deleting the old model, and deploy that code
  • Execute a migration to delete the old model structure from the database

With some thought and planning, it can be used for more drastic changes as well:

  • Deploy code that completely removes dependence on a section of the database, presumably replacing those sections of the site with maintenance pages
  • Deploy a migration that makes drastic changes that would not for whatever reason work with the above dual-model workflow
  • Deploy code that brings the affected sections back with the new model structure supported

This can be hard to organize and requires strict discipline and firm understanding of your code's interaction with your database, but in practice, it does allow for most changes to be made with no more downtime than the server restart itself imposes.

like image 97
JdV Avatar answered Nov 03 '22 18:11

JdV


Looks like fast-database changeovers are the way to go, but it requires a dedicated database.

http://devcenter.heroku.com/articles/fast-database-changeovers

Alternatively, here's a tutorial for copying the data from one database (e.g., production) to another database (e.g., staging), doing the schema/data migration (e.g., using django/south), then switching the app to use the newly-updated database instance.

http://devcenter.heroku.com/articles/migrating-data-between-plans

Seems reasonable, but potentially slow if there's a large amount of data.

like image 39
jdve Avatar answered Nov 03 '22 18:11

jdve


The recommended method is this:

  • Add database changes for your new features to your existing code
  • Make the existing code compatible with the new schema
  • Deploy
  • Add the new features to your codebase
  • Deploy

This means that your database changes are already in place when the code starts to require them.

However....

There's a couple of issues with this. First that I know of no development shop that is organised enough to be able to handle this, as features just get built ad-hoc, and secondly that you're not really saving anything.

Generally speaking, unless your making big changes to a massive database your changes won't take long to apply and are usually over in a couple of seconds which a developer can work around quite happily issuing restarts etc when needed. The risk being that a user might get an error page. If the changes are larger, you have some alternatives. One is using maintenance mode to turn the site off for a few seconds.

To be honest, there is no clear cut way for how to handle this nicely as by definition your code needs to be in place for your database changes to start. The best way I've found to approach the problem is to look at each change individually and work out the smoothest path for each on a case by case basis.

Rehearsing deployments on a staging environment will mitigate the risk of a deploy going bad, and give you an idea of the impact.

like image 1
Neil Middleton Avatar answered Nov 03 '22 18:11

Neil Middleton