Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data transition over multiple application versions

When upgrading a GAE application, what is the best way to upgrade the data model?

The version number of the application allows to separate multiple versions, but these application versions use the same data store (according to How to change application after deployed into Google App Engine?). So what happens when I upload a version of the application with a different data model (I'm thinking python here, but the question should also be valid for Java)? I guess it shouldn't be a problem if the changes add a nullable field and some new classes, so the existing model can be extended without harm. But what in case the data model changes are more profound? Do I actually lose the existing data if it becomes inconsistent with the new data model?

The only option I see for the moment are putting the data store into maintenance read-only mode, transforming the data offline and deploying the whole again.

like image 949
DuXati Avatar asked Oct 06 '11 07:10

DuXati


1 Answers

There are few ways of dealing with that and they are not mutually exclusive:

  • Make a non-breaking changes to your datastore and work around the issues it creates. Inserting new fields into existing model classes, switching fields from required to optional, adding new models, etc. - these won't break compatibility with any existing entities. But since those entities do not magically change to conform to new model (remember, datastore is a schema-less DB), you might need a legacy code that will partially support the old model. For example, if you have added a new field, you will want to access it via getattr(entity, "field_name", default_value) rather than entity.field_name so that it doesn't result in AttributeError for old entities.
  • Gradually convert the entities to new format. This is quite simple: if you find an entity that still uses the old model, make appropriate changes. In the example above, you would want to put the entity back with new field being added:

    if not hasattr(entity, "field_name"):
        entity.field_name = default_value
        entity.put()
    val = entity.field_name # no getattr'ing needed now
    

    Ideally, all your entities will be eventually processed in such manner and you will be able to remove the converting code at some point. In reality, there will always be some leftovers which should be converted manually -- and this bring us to option number three...

  • Batch-convert your entities to new format. The complexity of logistics behind this depends greatly on the number of entities to process, your site's activity, resources you can devote to the process, etc. Just note that using straightforward MapReduce may not be the best idea - especially if you used the gradual convert technique described above. This is because MapReduce processes all entities of given kind (fetching them) while there may only be a tiny percentage needing that. Hence it could be beneficial to code the conversion code by hand, writing the query for old entities explicitly and e.g. using a library such as ndb.
like image 157
Xion Avatar answered Nov 06 '22 15:11

Xion