I have been running a java high-replication web application on Google AppEngine for some time now. About two days ago - basically out of nowhere - a lot of requests began to fail with HTTP status 500 and error code 121, meaning that the respective GAE instance crashes or is shut down.
Here is an exemplary log entry, which I now have tons of:
2013-02-15 06:44:00.909 /api 500 3770ms 0kb Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17
###.###.###.### - - [14/Feb/2013:22:44:00 -0800] "POST /api HTTP/1.1" 500 0 - "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17" "###.########.###" ms=3770 cpu_ms=1191 exit_code=121 instance=00c61b117c2c2b8fd8c433bc45a62183829f6484
W 2013-02-15 06:44:00.652
A problem was encountered with the process that handled this request, causing it to exit. This is likely to cause a new process to be used for the next request to your application. (Error code 121)
The error sometimes occurs right within a 'warmup' request, thus when a new instance receives its first request. An associated log entry looks like this:
2013-02-15 06:40:02.779 /_ah/warmup 500 2970ms 0kb
0.1.0.3 - - [14/Feb/2013:22:40:02 -0800] "GET /_ah/warmup HTTP/1.1" 500 0 - - "2013-02-14-1438.flox-by-gamua.appspot.com" ms=2971 cpu_ms=671 loading_request=1 exit_code=121 instance=00c61b117c48cb17ea555d1988c0db473c2390
I 2013-02-15 06:40:02.437
This request caused a new process to be started for your application and thus caused your application code to be loaded for the first time. This request may thus take longer and use more CPU than a typical request for your application.
W 2013-02-15 06:40:02.437
A problem was encountered with the process that handled this request, causing it to exit. This is likely to cause a new process to be used for the next request to your application. (Error code 121)
I have been searching the web for this problem and it looks like it happened before: https://code.google.com/p/googleappengine/issues/detail?id=7348.
Since all relevant issues have been marked as 'resolved', I did file a new GAE production issue over here: https://code.google.com/p/googleappengine/issues/detail?id=8812
Edit 2013-04-29: The link above does not work anymore since this issue has been flagged as 'Restricted' by the GAE team.
Unfortunately, my cries for help went unnoticed for over two days now. That's why I am, in my utter desperation, asking for your help!
Does anyone know what's causing error code 121? Is there some form of documentation? Is something wrong with my app? Is there a way to nudge the AppEngine team to have a look into this issue?
Thanks a lot!
Instances are the basic building blocks of App Engine, providing all the resources needed to successfully host your application. At any given time, your application can be running on one or many instances with requests being spread across all of them.
App Engine attempts to keep manual and basic scaling instances running indefinitely. However, at this time there is no guaranteed uptime for manual and basic scaling instances.
Instances are created on demand to handle requests and automatically turned down when idle. Instances are created on demand to handle requests and automatically shut down when idle, based on the idle_timeout configuration parameter.
Google App Engine provides four possible runtime environments for applications, one for each of four programming languages: Java, Python, PHP, and Go.
I don't have enough points to reply but I have a specific use case that seems interesting:
Everything works as expected, except for one instance, instance=2, which basically cycles
Backends allows you to address a specific instance such as 2.backendname.appname.appspot.com and apparently something is wrong with that instance.
I suppose it's reassuring to know that there's one bad instance that's repeatedly failing due to a vague error code, instead of many instances failing randomly due to a vague error code. It'd be more reassuring if that instance were dropped, etc particularly if it's cycling this pattern.
Check your log retention limits, and that you haven't exceeded them. You wouldn't expect that bypassing your log retention limits would cause an exception that caused the instance to fail, but after I increased mine, I stopped seeing this error crop up and my backend cron jobs were able to complete.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With