Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Schema migration on GAE datastore

First off, this is my first post on Stack Overflow, so please forgive any newbish mis-steps. If I can be clearer in terms of how I frame my question, please let me know.

I'm running a large application on Google App Engine, and have been adding new features that are forcing me to modify old data classes and add new ones. In order to clean our database and update old entries, I've been trying to write a script that can iterate through instances of a class, make changes, and then re-save them. The problem is that Google App Engine times out when you make calls to the server that take longer than a few seconds.

I've been struggling with this problem for several weeks. The best solution that I've found is here: http://code.google.com/p/rietveld/source/browse/trunk/update_entities.py?spec=svn427&r=427

I created a version of that code for my own website, which you can see here:

def schema_migration (self, target, batch_size=1000):
    last_key = None
    calls = {"Affiliate": Affiliate, "IPN": IPN, "Mail": Mail, "Payment": Payment, "Promotion": Promotion}

    while True:
        q = calls[target].all()
        if last_key:
            q.filter('__key__ >', last_key)
        q.order('__key__')
        this_batch_size = batch_size

        while True:
            try:
                batch = q.fetch(this_batch_size)
                break
            except (db.Timeout, DeadlineExceededError):
                logging.warn("Query timed out, retrying")
                if this_batch_size == 1:
                    logging.critical("Unable to update entities, aborting")
                    return
                this_batch_size //= 2

        if not batch:
            break

        keys = None
        while not keys:
            try:
                keys = db.put(batch)
            except db.Timeout:
                logging.warn("Put timed out, retrying")

        last_key = keys[-1]
        print "Updated %d records" % (len(keys),)

Strangely, the code works perfectly for classes with between 100 - 1,000 instances, and the script often takes around 10 seconds. But when I try to run the code for classes in our database with more like 100K instances, the script runs for 30 seconds, and then I receive this:

"Error: Server Error

The server encountered an error and could not complete your request. If the problem persists, please report your problem and mention this error message and the query that caused it.""

Any idea why GAE is timing out after exactly thirty seconds? What can I do to get around this problem?

Thanks you! Keller

like image 394
Keller Avatar asked Feb 28 '11 00:02

Keller


1 Answers

you are hitting the second DeadlineExceededError by the sound of it. AppEngine requests can only run for 30 seconds each. When DeadLineExceedError is raised it's your job to stop processing and tidy up as you are running out of time, the next time it is raised you cannot catch it.

You should look at using the Mapper API to split your migration into batches and run each batch using the Task Queue.

like image 131
Chris Farmiloe Avatar answered Oct 10 '22 17:10

Chris Farmiloe