Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What response times can be expected from GAE/NDB?

We are currently building a small and simple central HTTP service that maps "external identities" (like a facebook id) to an "internal (uu)id", unique across all our services to help with analytics.

The first prototype in "our stack" (flask+postgresql) was done within a day. But since we want the service to (almost) never fail and scale automagically, we decided to use Google App Engine.

After a week of reading&trying&benchmarking this question emerges:

What response times are considered "normal" on App Engine (with NDB)?

We are getting response times that are consistently above 500ms on average and well above 1s in the 90percentile.

I've attached a stripped down version of our code below, hoping somebody can point out the obvious flaw. We really like the autoscaling and the distributed storage, but we can not imagine 500ms really is the expected performance in our case. The sql based prototype responded much faster (consistently), hosted on one single Heroku dyno using the free, cache-less postgresql (even with an ORM).

We tried both synchronous and asynchronous variants of the code below and looked at the appstats profile. It's always RPC calls (both memcache and datastore) that take very long (50ms-100ms), made worse by the fact that there are always multiple calls (eg. mc.get() + ds.get() + ds.set() on a write). We also tried deferring as much as possible to the task queue, without noticeable gains.

import json
import uuid

from google.appengine.ext import ndb

import webapp2
from webapp2_extras.routes import RedirectRoute


def _parse_request(request):
    if request.content_type == 'application/json':
        try:
            body_json = json.loads(request.body)
            provider_name = body_json.get('provider_name', None)
            provider_user_id = body_json.get('provider_user_id', None)
        except ValueError:
            return webapp2.abort(400, detail='invalid json')
    else:
        provider_name = request.params.get('provider_name', None)
        provider_user_id = request.params.get('provider_user_id', None)

    return provider_name, provider_user_id


class Provider(ndb.Model):
    name = ndb.StringProperty(required=True)


class Identity(ndb.Model):
    user = ndb.KeyProperty(kind='GlobalUser')


class GlobalUser(ndb.Model):
    uuid = ndb.StringProperty(required=True)

    @property
    def identities(self):
        return Identity.query(Identity.user==self.key).fetch()


class ResolveHandler(webapp2.RequestHandler):
    @ndb.toplevel
    def post(self):
        provider_name, provider_user_id = _parse_request(self.request)

        if not provider_name or not provider_user_id:
            return self.abort(400, detail='missing provider_name and/or provider_user_id')

        identity = ndb.Key(Provider, provider_name, Identity, provider_user_id).get()

        if identity:
            user_uuid = identity.user.id()
        else:
            user_uuid = uuid.uuid4().hex

            GlobalUser(
                id=user_uuid,
                uuid=user_uuid
            ).put_async()

            Identity(
                parent=ndb.Key(Provider, provider_name),
                id=provider_user_id,
                user=ndb.Key(GlobalUser, user_uuid)
            ).put_async()

        return webapp2.Response(
            status='200 OK',
            content_type='application/json',
            body = json.dumps({
                'provider_name' : provider_name,
                'provider_user_id' : provider_user_id,
                'uuid' : user_uuid
            })
        )

app = webapp2.WSGIApplication([
      RedirectRoute('/v1/resolve', ResolveHandler, 'resolve', strict_slash=True)
], debug=False)

For completeness sake the (almost default) app.yaml

application: GAE_APP_IDENTIFIER
version: 1
runtime: python27
api_version: 1
threadsafe: yes

handlers:
- url: .*
  script: main.app

libraries:
- name: webapp2
  version: 2.5.2
- name: webob
  version: 1.2.3

inbound_services:
- warmup
like image 716
tinnet Avatar asked Feb 14 '13 15:02

tinnet


2 Answers

In my experience, RPC performance fluctuates by orders of magnitude, between 5ms-100ms for a datastore get. I suspect it's related to the GAE datacenter load. Sometimes it gets better, sometimes it gets worse.

Your operation looks very simple. I expect that with 3 requests, it should take about 20ms, but it could be up to 300ms. A sustained average of 500ms sounds very high though.

ndb does local caching when fetching objects by ID. That should kick in if you're accessing the same users, and those requests should be much faster.

I assume you're doing perf testing on the production and not dev_appserver. dev_appserver performance is not representative.

Not sure how many iterations you've tested, but you might want to try a larger number to see if 500ms is really your average.

When you're blocked on simple RPC calls, there's not too optimizing you can do.

like image 167
dragonx Avatar answered Oct 27 '22 08:10

dragonx


The 1st obvious moment I see: do you really need a transaction on every request?

I believe that unless most of your requests create new entities it's better to do .get_by_id() outside of transaction. And if entity not found then start transaction or even better defer creation of the entity.

def request_handler(key, data):
  entity = key.get()
  if entity:
    return 'ok'
  else:
    defer(_deferred_create, key, data)
    return 'ok'

def _deferred_create(key, data):
  @ndb.transactional
  def _tx():
    entity = key.get()
    if not entity:
       entity = CreateEntity(data)
       entity.put()
  _tx()

That should give much better response time for user facing requests.

The 2nd and only optimization I see is to use ndb.put_multi() to minimize RPC calls.

P.S. Not 100% sure but you can try to disable multithreading (threadsave: no) to get more stable response time.

like image 31
Alexander Trakhimenok Avatar answered Oct 27 '22 08:10

Alexander Trakhimenok