Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mass updates in Google App Engine Datastore

What is the proper way to perform mass updates on entities in a Google App Engine Datastore? Can it be done without having to retrieve the entities?

For example, what would be the GAE equivilant to something like this in SQL:

UPDATE dbo.authors
SET    city = replace(city, 'Salt', 'Olympic')
WHERE  city LIKE 'Salt%';
like image 411
Yarin Avatar asked Nov 19 '11 20:11

Yarin


Video Answer


2 Answers

There isn't a direct translation. The datastore really has no concept of updates; all you can do is overwrite old entities with a new entity at the same address (key). To change an entity, you must fetch it from the datastore, modify it locally, and then save it back.

There's also no equivalent to the LIKE operator. While wildcard suffix matching is possible with some tricks, if you wanted to match '%Salt%' you'd have to read every single entity into memory and do the string comparison locally.

So it's not going to be quite as clean or efficient as SQL. This is a tradeoff with most distributed object stores, and the datastore is no exception.

That said, the mapper library is available to facilitate such batch updates. Follow the example and use something like this for your process function:

def process(entity):
  if entity.city.startswith('Salt'):
    entity.city = entity.city.replace('Salt', 'Olympic')
    yield op.db.Put(entity)

There are other alternatives besides the mapper. The most important optimization tip is to batch your updates; don't save back each updated entity individually. If you use the mapper and yield puts, this is handled automatically.

like image 81
Drew Sears Avatar answered Oct 19 '22 14:10

Drew Sears


No, it can't be done without retrieving the entities.

There's no such thing as a '1000 max record limit', but there is of course a timeout on any single request - and if you have large amounts of entities to modify, a simple iteration will probably fall foul of that. You could manage this by splitting it up into multiple operations and keeping track with a query cursor, or potentially by using the MapReduce framework.

like image 23
Daniel Roseman Avatar answered Oct 19 '22 13:10

Daniel Roseman