Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Migrating data when changing an NDB field's property type

Suppose I initially create an ndb.Model and wanted to change a field's ndb property type (e.g. IntegerProperty to StringProperty), but wanted to cast the current data stored in that field so that I don't lose that data. One method would be to simply create a new field name and then migrate the data over with a script, but are there other more convenient ways of accomplishing this?

For example, suppose I had the following model:

class Car(ndb.Model):
    name = ndb.StringProperty()
    production_year = ndb.IntegerProperty()

And I stored an instance of the entity:

c = new Car()
c.name = "Porsche"
c.production_year = 2013 

And wanted to change production_year to an ndb.StringProperty() without "losing" the value I set (it would still exist, but would not be retrievable). If I just change production_year to an instance of ndb.StringProperty(), the field value does not report a value which makes sense since the type doesn't match.

So if I changed the model to:

class Car(ndb.Model):
    name = ndb.StringProperty()
    production_year = ndb.StringProperty()

Attempting to retrieve the field with dot notation would result in a value of None. Anyone run into this situation, and could you explain what you did to solve it? Thanks.

like image 512
random_stackoverflow_user Avatar asked Nov 07 '13 17:11

random_stackoverflow_user


2 Answers

How you approach this will depend on how many entities you have. If you a relatively small number of entities say in the 10000's I would just use the remote_api and retrieve the raw underlying data from the datastore and manipulate the data directly then write it back, not using the models. For instance this will fetch raw entities as and properties can be accessed like a dictionary. This code is pretty much lifted from the lower level appengine SDK code .

from google.appengine.api import datastore
from google.appengine.api import datastore_errors

def get_entities(keys):
    rpc = datastore.GetRpcFromKwargs({})
    keys, multiple = datastore.NormalizeAndTypeCheckKeys(keys)
    entities = None
    try:
        entities = datastore.Get(keys, rpc=rpc)
    except datastore_errors.EntityNotFoundError:
        assert not multiple

    return entities

def put_entities(entities):
    rpc = datastore.GetRpcFromKwargs({})
    keys = datastore.Put(entities, rpc=rpc)
    return keys

You would use this as follows (I am using fetch to simplify things a bit code wise for this example)

x = Car.query(keys_only=True).fetch(100)
results = get_entities([i.to_old_key() for i in x])

for i in results:
    i['production_year'] = unicode(i['production_year'])

put_entities(results)

This is old code I have and datastore.NormalizeAndTypeCheckKeys takes the old db style key, I haven't looked to see of there is an equivalent function for ndb style keys, but this does work. (Just tested it ;-)

This approach allows you to migrate data without deploying any new code.
If you have millions of entities then you should look at other approaches for processing, ie using this code and using mapreduce.

like image 168
Tim Hoffman Avatar answered Nov 20 '22 22:11

Tim Hoffman


Just adding to Tim's answer, if you want to change your property to Text, you can:

from google.appengine.api import datastore_types

(...)

for i in results:
    i['production_year'] = datastore_types.Text(i['production_year'])
like image 31
feroult Avatar answered Nov 20 '22 21:11

feroult