Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

App Engine deserializing records in python: is it really this slow?

In profiling my python2.7 App Engine app, I find that it's taking an average of 7ms per record to deserialize records fetched from ndb into python objects. (In pb_to_query_result, pb_to_entity and their descendants—this does not include the RPC time to query the database and receive the raw records.)

Is this expected? My model has six properties, one of which is a LocalStructuredProperty with 15 properties, which also includes a repeated StructuredProperty with four properties, but the average object should have less than 30 properties all told, I think.

Is it expected to be this slow? I want to fetch a couple of thousand records to do some simple aggregate analysis, and while I can tolerate a certain amount of latency, over 10 seconds is a problem. Is there anything I can do to restructure my models or my schema to make this more viable? (Other than the obvious solution of pre-calculating my aggregate analysis on a regular basis and caching the results.)

If it's unusual for it to be this slow, it would be helpful to know that so I can go and look for what I might be doing that impairs it.

like image 464
Tim Dierks Avatar asked Aug 15 '13 18:08

Tim Dierks


1 Answers

Short answer: yes.

I find deserialization in Python to be very slow, especially where repeated properties are involved. Apparently, GAE-Python deserialization creates boatloads of objects. It's known to be inefficient, but also apparently, no one wants to touch it because it's so far down the stack.

It's unfortunate. We run F4 Front Ends most of the time due to this overhead (i.e., faster CPU == faster deserialization).

like image 106
JasonC Avatar answered Oct 19 '22 09:10

JasonC