I am running a java process on amazon ec2. It ran for 72 mins and then suddenly I get "java result 137". That is all, there are no exceptions or any other error messages. I have searched for this error but couldn't find anything useful. What could be the cause of it and how to resolve it? Please let me know.
If a few pods are consistently getting exit code 137 returned to them, then that is a sign that you need to increase the amount of space you afford to the pod. By increasing the maximum limit manually in the pods that are under the most strain, you'll be able to reduce the frequency with which this problem occurs.
When a container (Spark executor) runs out of memory, YARN automatically kills it. This causes the "Container killed on request. Exit code is 137" error.
Exit codes above 127 typically mean the process was stopped because of a Signal.
The exit code 137 then resolves to 128 + 9, whereas Signal 9 is SIGKILL, i.e. the process was forcefully killed. This can among others be a "kill -9 " command. However in your case this could be an out of memory condition on the operating system, which causes a functionality called "OOM Killer" to stop the process which is using up most of the memory in order to keep the OS itself stable even in such a condition.
See this question for a similar discussion.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With