Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I reduce Google App Engine datastore latency?

Through appstats, I can see that my datastore queries are taking about 125ms (api and cpu combined), but often there are long latencies (e.g. upto 12000ms) before the queries are executed.

I can see that my latency from the datastore is not related to my query (e.g. the same query/data has vastly different latencies), so I'm assuming that it's a scheduling issue with app engine.

Are other people seeing this same problem ?

Is there someway to reduce the latency (e.g. admin console setting) ?

Here's a screen shot from appstats. This servlet has very little cpu processing. It does a getObjectByID and then does a datastore query. The query has an OR operator so it's being converted into 3 queries by app engine.

appstats screenshot . As you can see, it takes 6000ms before the first getObjectByID is even executed. There is no processing before the get operation (other than getting pm). I thought this 6000ms latency might be due to an instance warm-up, so I had increased my idle instances to 2 to prevent any warm-ups.

Then there's a second latency around a 1000ms between the getObjectByID and the query. There's zero lines of code between the get and the query. The code simply takes the result of the getObjectByID and uses the data as part of the query.

The grand total is 8097ms, yet my datastore operations (and 99.99% of the servlet) are only 514ms (45ms api), though the numbers change every time I run the servlet. Here is another appstats screenshot that was run on the same servlet against the same data. enter image description here

Here is the basics of my java code. I had to remove some of the details for security purposes.

user = pm.getObjectById(User.class, userKey);           
//build queryBuilder.append(...
final Query query = pm.newQuery(UserAccount.class,queryBuilder.toString());
query.setOrdering("rating descending");
query.executeWithArray(args); 

Edited: Using Pingdom, I can see that GAE latency varies from 450ms to 7,399ms, or 1,644% difference !! This is with two idle instances and no users on the site. enter image description here

like image 230
chow Avatar asked Jan 19 '13 15:01

chow


Video Answer


2 Answers

I observed very similar latencies (in the 7000-10000ms range) in some of my apps. I don't think the bulk of the issue (those 6000ms) lies in your code.

In my observations, the issue is related to AppEngine spinning up a new instance. Setting min idle instances may help mitigate but it will not solve it (I tried up to 2 idle instances), because basically even if you have N idle instances app engine will prefer spinning up dynamic ones even when a single request comes in, and will "save" the idle ones in case of crazy traffic spikes. This is highly counter-intuitive because you'd expect it to use the instance that are already around and spin up dynamic ones for future requests.

Anyway, in my experience this issue (10000ms latency) very rarely happens under any non-zero amount of load, and many people had to revert to some king of pinging (possibly cron jobs) every couple of minutes (used to work with 5 minutes but lately instances are dying faster so it's more like a ping every 2 mins) to keep dynamic instances around to serve users who hit the site when no one else is on. This pinging is not ideal because it will eat away at your free quota (pinging every 5 minutes will eat away more than half of it) but I really haven't found a better alternative so far.

In recap, in general I found that app engine is awesome when under load, but not outstanding when you just have very few (1-3) users on the site.

like image 173
JohnIdol Avatar answered Oct 25 '22 21:10

JohnIdol


Appstats only helps diagnose performance issues when you make GAE API/RPC calls.

In the case of your diagram, the "blank" time is spent running your code on your instance. It's not going to be scheduling time.

Your guess that the initial delay may be because of instance warm-up is highly likely. It may be framework code that is executing. I can't guess at the delay between the Get and Query. It may be that there's 0 lines of code, but you called some function in the Query that takes time to process.

Without knowledge of the language, framework or the actual code, no one will be able to help you.

You'll need to add some sort of performance tracing on your own in order to diagnose this. The simplest (but not highly accurate) way to do this is to add timers and log timer values as your code executes.

like image 42
dragonx Avatar answered Oct 25 '22 22:10

dragonx