Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Update for big count of NDB Entities fails

I have very simple task. After migration and adding new field (repeated and composite property) to existing NDB Entity (~100K entities) I need to setup default value for it.

I tried that code first:

q = dm.E.query(ancestor=dm.E.root_key)
for user in q.iter(batch_size=500):
     user.field1 = [dm.E2()]
     user.put()

But it fails with such errors:

2015-04-25 20:41:44.792 /**** 500 599830ms 0kb AppEngine-Google; (+http://code.google.com/appengine) module=default version=1-17-0
W 2015-04-25 20:32:46.675 suspended generator run_to_queue(query.py:938) raised Timeout(The datastore operation timed out, or the data was temporarily unavailable.)
W 2015-04-25 20:32:46.676 suspended generator helper(context.py:876) raised Timeout(The datastore operation timed out, or the data was temporarily unavailable.)
E 2015-04-25 20:41:44.475 Traceback (most recent call last): File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/runtime/wsgi.py", line 267, in

The task runs on separate task queue so it have at least 10 mins to execute but seems it is not enough for it. Strange other thing: warnings from NDB. May be there is dead lock because of updates for same Entities from other instances (initiated by users) but not sure.

Anyway I want to know best practices (and simplest) for such task. I know about MapReduce but currently it looks for me overcomplicated for such task.

UPDATE:

Also I tried to use put_multi by grabbing all entities in an array but GAE stops the instance as soon it exceeds ~600 MB of memory (with 500 MB limit). Seems it is not enough memory to store all entities (~100K).

like image 839
Skie Avatar asked Nov 10 '22 15:11

Skie


1 Answers

After you will execute _migrate_users() it will process 50 users and then create another task to process next 50 users and so on. You may use bigger batch size than 50 depending on the size of your entities.

def _migrate_users(curs=None):
  users, next_curs, more = User.query().fetch_page(50, start_cursor=curs)
  for user in users:
    user.field1 = 'bla bla'
  ndb.put_multi(users)
  if more:
    deferred.defer(_migrate_users, next_curs, _queue='default')
like image 126
Dmytro Sadovnychyi Avatar answered Nov 14 '22 21:11

Dmytro Sadovnychyi