Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Updating a large number of entities in a datastore on Google App Engine

I would like to perform a small operation on all entities of a specific kind and rewrite them to the datastore. I currently have 20,000 entities of this kind but would like a solution that would scale to any amount.

What are my options?

like image 331
Tomer Weller Avatar asked Jun 26 '12 08:06

Tomer Weller


People also ask

How do I update my Datastore entity?

To update an existing entity, modify the attributes of the Entity object, then pass it to the DatastoreService. put() method. The object data overwrites the existing entity. The entire object is sent to Datastore with every call to put() .

What is indexing in Datastore?

An index is defined on a list of properties of a given entity kind, with a corresponding order (ascending or descending) for each property. For use with ancestor queries, the index may also optionally include an entity's ancestors. An index table contains a column for every property named in the index's definition.

Is Google Datastore deprecated?

The Cloud Datastore Administration API v1beta1 is now deprecated. The Cloud Datastore Admin backup feature is being phased out in favor of the managed export and import for Cloud Datastore. Please migrate to the managed export and import functionality at your earliest convenience.

What are Datastore entities?

Data objects in Firestore in Datastore mode are known as entities. An entity has one or more named properties, each of which can have one or more values. Entities of the same kind do not need to have the same properties, and an entity's values for a given property do not all need to be of the same data type.


2 Answers

Use a mapper - this is part of the MapReduce framework, but you only want the first component, map, as you don't need the shuffle/reduce step if you're simply mutating datastore entities.

like image 154
Daniel Roseman Avatar answered Oct 16 '22 06:10

Daniel Roseman


Daniel is correct, but if you don't want to mess up with the mapper, that requires you to add another library to your app you can do it using Task Queues or even simpler using the deferred library that is included since SDK 1.2.3.

20.000 entities it's not that dramatic and I assume that this task is not going be performed in regular basis (but even if it does, it is feasible).

Here is an example using NDB and the deferred library (you can easily do that using DB, but consider switching to NDB anyway if you are not already using it). It's a pretty straight forward way, but without caring much about the timeouts:

def update_model(limit=1000):
  more_cursor = None
  more = True
  while more:
    model_dbs, more_cursor, more = Model.query().fetch_page(limit, start_cursor=more_cursor)
    for model_db in model_dbs:
      model_db.updated = True
    ndb.put_multi(model_dbs)
    logging.info('### %d entities were updated' % len(model_dbs))

class UpdateModelHandler(webapp2.RequestHandler):
  def get(self):
    deferred.defer(update_model, _queue='queue')
    self.response.headers['Content-Type'] = 'text/html'
    self.response.out.write('The task has been started!')
like image 37
Lipis Avatar answered Oct 16 '22 07:10

Lipis