In profiling my python2.7 App Engine app, I find that it's taking an average of 7ms per record to deserialize records fetched from ndb into python objects. (In pb_to_query_result
, pb_to_entity
and their descendants—this does not include the RPC time to query the database and receive the raw records.)
Is this expected? My model has six properties, one of which is a LocalStructuredProperty
with 15 properties, which also includes a repeated StructuredProperty
with four properties, but the average object should have less than 30 properties all told, I think.
Is it expected to be this slow? I want to fetch a couple of thousand records to do some simple aggregate analysis, and while I can tolerate a certain amount of latency, over 10 seconds is a problem. Is there anything I can do to restructure my models or my schema to make this more viable? (Other than the obvious solution of pre-calculating my aggregate analysis on a regular basis and caching the results.)
If it's unusual for it to be this slow, it would be helpful to know that so I can go and look for what I might be doing that impairs it.
Short answer: yes.
I find deserialization in Python to be very slow, especially where repeated properties are involved. Apparently, GAE-Python deserialization creates boatloads of objects. It's known to be inefficient, but also apparently, no one wants to touch it because it's so far down the stack.
It's unfortunate. We run F4 Front Ends most of the time due to this overhead (i.e., faster CPU == faster deserialization).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With