Database migrations are a popular pattern, particularly with Ruby on Rails. Since migrations specify how to mold old data to fit a new schema, they can be helpful when you have production data that must be converted quickly and reliably.
But migrating models in App Engine is difficult since processing all entities sequentially is difficult, and there is no offline operation to migrate everything effectively in one big transaction.
Here is what I do.
I have a MigratingModel class, which all of my models inherit from. Here is migrating_model.py:
"""Models which know how to migrate themselves"""
import logging
from google.appengine.ext import db
from google.appengine.api import memcache
class MigrationError(Exception):
"""Error migrating"""
class MigratingModel(db.Model):
"""A model which knows how to migrate itself.
Subclasses must define a class-level migration_version integer attribute.
"""
current_migration_version = db.IntegerProperty(required=True, default=0)
def __init__(self, *args, **kw):
if not kw.get('_from_entity'):
# Assume newly-created entities needn't migrate.
try:
kw.setdefault('current_migration_version',
self.__class__.migration_version)
except AttributeError:
msg = ('migration_version required for %s'
% self.__class__.__name__)
logging.critical(msg)
raise MigrationError, msg
super(MigratingModel, self).__init__(*args, **kw)
@classmethod
def from_entity(cls, *args, **kw):
# From_entity() calls __init__() with _from_entity=True
obj = super(MigratingModel, cls).from_entity(*args, **kw)
return obj.migrate()
def migrate(self):
target_version = self.__class__.migration_version
if self.current_migration_version < target_version:
migrations = range(self.current_migration_version+1, target_version+1)
for self.current_migration_version in migrations:
method_name = 'migrate_%d' % self.current_migration_version
logging.debug('%s migrating to %d: %s'
% (self.__class__.__name__,
self.current_migration_version, method_name))
getattr(self, method_name)()
db.put(self)
return self
MigratingModel
intercepts the conversion from the raw datastore entity to the full db.Model instance. If current_migration_version
has fallen behind the class's latest migration_version
, then it runs a series of migrate_N()
methods which do the heavy lifting.
For example:
"""Migrating model example"""
# ...imports...
class User(MigratingModel):
migration_version = 3
name = db.StringProperty() # deprecated: use first_name and last_name
first_name = db.StringProperty()
last_name = db.StringProperty()
age = db.IntegerProperty()
invalid = db.BooleanProperty() # to search for bad users
def migrate_1(self):
"""Convert the unified name to dedicated first/last properties."""
self.first_name, self.last_name = self.name.split()
def migrate_2(self):
"""Ensure the users' names are capitalized."""
self.first_name = self.first_name.capitalize()
self.last_name = self.last_name.capitalize()
def migrate_3(self):
"""Detect invalid accounts"""
if self.age < 0 or self.age > 85:
self.invalid = True
On a busy site, the migrate() method should retry if db.put()
fails, and possibly log a critical error if the migration didn't work.
I haven't gotten there yet, but at some point I would probably mix-in my migrations from a separate file.
It is hard to test on App Engine. It's hard to get access to your production data in a test environment, and at this time it is difficult-to-impossible to make a coherent snapshot backup. Therefore, for major changes, consider making a new version that uses a completely different model name which imports from the old model and migrates as it needs. (For example, User2
instead of User
). That way, if you need to fall back to the previous version, you have an effective backup of the data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With