I've noticed periodic, but consistent latency spikes from my app running on app engine. At first I thought the network may be slow but app stats confirmed that not to be the case.
I've been able to reproduce the latency spikes using older and newer versions of the SDKs, currently I'm using the following:
So usage on the app is pretty low, over the last 30 days I'm generally under 0.04 requests a second:
Most work is done with one instance as well:
Most operations' latency is well under a second, but an alarming number of requests take 10-30x longer.
So I figured it must just be network latency, but every appstat of a slower operation disproved this. Datastore and the network have always been incredibly reliable. Here is the anatomy of a slow request taking over 30 seconds:
Here is the anatomy of a normal request:
At a high level my code is pretty uninteresting: it's a simple api that makes a few network calls and saves / reads data from the cloud datastore. The entire source can be found on github here. The app runs on a single auto scaling app engine instance and is warmed up.
CPU usage over the last month doesnt seem to show anything exciting either:
It's really strange to see that even for quick operations, a huge percentage of time is spent on the CPU, even though the code simply creates a few objects, persists them, and returns JSON. I'm wondering if the CPU is getting pegged on my app engine instance by another app which could be causing performance to periodically degrade.
My appengine.xml config looks like this:
<?xml version="1.0" encoding="utf-8"?>
<appengine-web-app xmlns="http://appengine.google.com/ns/1.0">
<application>sauce-sync</application>
<version>1</version>
<threadsafe>true</threadsafe>
<automatic-scaling>
<!-- always keep an instance up in order to keep startup time low-->
<min-idle-instances>1</min-idle-instances>
</automatic-scaling>
</appengine-web-app>
And my web.xml looks like this:
<web-app xmlns="http://java.sun.com/xml/ns/javaee" version="2.5">
<servlet>
<servlet-name>SystemServiceServlet</servlet-name>
<servlet-class>com.google.api.server.spi.SystemServiceServlet</servlet-class>
<init-param>
<param-name>services</param-name>
<param-value>com.sauce.sync.SauceSyncEndpoint</param-value>
</init-param>
</servlet>
<servlet-mapping>
<servlet-name>SystemServiceServlet</servlet-name>
<url-pattern>/_ah/spi/*</url-pattern>
</servlet-mapping>
<!--reaper-->
<servlet>
<servlet-name>reapercron</servlet-name>
<servlet-class>com.sauce.sync.reaper.ReaperCronServlet</servlet-class>
</servlet>
<servlet-mapping>
<servlet-name>reapercron</servlet-name>
<url-pattern>/reapercron</url-pattern>
</servlet-mapping>
<servlet>
<servlet-name>reaper</servlet-name>
<servlet-class>com.sauce.sync.reaper.ReaperServlet</servlet-class>
</servlet>
<servlet-mapping>
<servlet-name>reaper</servlet-name>
<url-pattern>/reaper</url-pattern>
</servlet-mapping>
<welcome-file-list>
<welcome-file>index.html</welcome-file>
</welcome-file-list>
<filter>
<filter-name>ObjectifyFilter</filter-name>
<filter-class>com.googlecode.objectify.ObjectifyFilter</filter-class>
</filter>
<filter-mapping>
<filter-name>ObjectifyFilter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
</web-app>
TLDR I'm completely stuck and I'm not sure how to debug or fix this issue and I'm starting to think this is business as usual for smaller apps on app engine.
I'm thinking about turning off the resident instance for a while hoping that my app has just been running some bunk hardware or along side an app that is consuming alot of resources. Has anyone run into similar performance issues or knows of additional ways to profile your app?
EDIT:
I've tried running on 1 resident instance, I've also tried setting concurrent requests at 2-4 per this question with no results. Logs and appstats both confirm that an inordinate amount of time is spent waiting for my code to initially run. Here is a request that takes 25 seconds before my first line of code is run, not sure what endpoints / app engine is doing in this time.
Again load is still low and this is request is on a warmed up instance.
EDIT 2:
Seems like for whatever reason app engine + endpoints doesnt play well with min-idle-instances
set. Reverting to the default app engine config fixed my problem.
For that reason, when you occasionally increase the Ping with high values, that wants to decide that we have high latency spikes . So, at that moment that service that we are using works slow or we cannot use it. In latency, an important element is delay.
In case of high latency peaks we find that, for example, in a streaming video the audio and the video do not coincide. In the online game you see the opponent’s movements too late and you cannot do anything, it is what is known as LAG. These delays, the more pronounced they are, the more negatively they will affect our experience.
When you observe how your app serves users, you find that some users experience higher app latency than others. The differences are amplified with more diverse populations of users and higher usage of resources.
Because the client app view of latency determines user experience to a greater degree, this document focuses on that type of latency. Monitoring is more appropriate than tracing to identify the effect of high CPU utilization because high CPU isn't generally the result of one HTTP request.
I don't have an answer, but I can offer you some debugging tips.
Appstats may or may not be reporting correctly. However, log messages get timestamped. Log before & after each of the RPC operations. That should give you some insight.
30s delays sound a lot like warmup requests, which should be marked clearly in the logs. One thing that I've found in the past is that setting any resident instances for low-traffic apps (unintuitively) tended to route a lot of requests to cold instances. Use the default setting and set up a cron task to ping and endpoint once per minute.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With