Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

bash error code 137 vs 1 when out of memory

Tags:

c#

bash

mono

Context

I run the following command in linux bash :

mono --debug --debugger-agent=transport=dt_socket,address=198.178.155.198:10000 ./Stress.exe

Stress.exe is a C# application.

What happens

At one point the system is out of memory, which is wanted. An error code is returned.

Error code returned (echo $?)

Code 1 : When my program creates a throw because it's out of memory.

Code 137 : When it is killed by OS when overloading memory.

Question

Why is it sometime the OS that kills my application? Why is the result not always the same?

like image 864
Cher Avatar asked Jun 19 '15 17:06

Cher


People also ask

How do I fix exit code 137?

If a few pods are consistently getting exit code 137 returned to them, then that is a sign that you need to increase the amount of space you afford to the pod. By increasing the maximum limit manually in the pods that are under the most strain, you'll be able to reduce the frequency with which this problem occurs.

What does exit code 137 mean?

When a container (Spark executor) runs out of memory, YARN automatically kills it. This causes the "Container killed on request. Exit code is 137" error.

What does exit code 1 mean in Linux?

Exit Code 1 indicates that a container shut down, either because of an application failure or because the image pointed to an invalid file. In a Unix/Linux operating system, when an application terminates with Exit Code 1, the operating system ends the process using Signal 7, known as SIGHUP.

What is exit code in Linux?

An exit code, or sometimes known as a return code, is the code returned to a parent process by an executable. On POSIX systems the standard exit code is 0 for success and any number from 1 to 255 for anything else. Exit codes can be interpreted by machine scripts to adapt in the event of successes of failures.


2 Answers

Assuming:

  • Mono is running the SGEN based GC
  • Linux OOM Killer is actually enabled
  • Your Stress.exe is only alloc'ing managed memory i.e. No Native interop, no use of the Marshaling memory allocators, no code flagged unsafe, etc..
  • You are constantly creating objects and never release those refs.

Lets talk SGEN, so as you alloc objects, they are created in the nursery, as you run out of memory in the nursery, when the GC does a sweep and has to do a nursery collection as it is full, the live objects are move to it's major heap. If the major head is full, than more OS memory is requested. You can adjust the amount of initial memory allocated to your mono app and even fix the amount of memory (max) that Sgen can use. Also managed objects over 8000 bytes are handle by Sgen's Large object Space manager and that is non-nursery/major-heap based memory but it is still managed objects/memory.

So normally when mono needs more space for managed objects and does an OS request for an additional block and the OS says NO, you see the OutOfMemory exception and your 0 exit code. Your Stress test is happy.

But OOM is watching that mono process and adjusting it's score (oom_score) higher and higher. It could strike that mono process at any moment, but I would put the odds that it is right at the time of a GC sweep when the app threads are suspended by SGEN but before SGEN actually does a OS memory request due to a out of managed memory space in the nusery. Thus you get an exit of 137. 137 & 127 = 9, so the mono process was sent a SIGKILL signal (kill -9) and your Stress test is not happy.

Try this as an experiment:

  • 1) If you turn off the OOM killer completely. Assuming this is not a live production box you are stressing ;-) You will should see the "System.OutOfMemoryException" 100% of the time.
  • or 2) Set the oom_adj of just the mono process to -17 and OOM will leave it alone. Just wrap you mono launch in a shell script to grab it's pid and echo -17 to that process's oom_adj.
  • or 3) If you adjust the oom_adj of the mono process lower (down all the way to -16, than you will see mono capturing it's own managed memory outage 'more of the time' but it will never be 100% of the time....

This is not a Mono and/or Sgen/GC related 'issue' at all. Any process consuming more and more memory is subject to be OOM killed. Be it a big fat Oracle database or just an app/daemon that has a memory leak, etc.. they are all subject to kill'd.

like image 68
SushiHangover Avatar answered Oct 05 '22 06:10

SushiHangover


"This article describes the Linux out-of-memory (OOM) killer and how to find out why it killed a particular process. It also provides methods for configuring the OOM killer to better suit the needs of many different environments."

http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html

like image 34
Neil H Watson Avatar answered Oct 05 '22 05:10

Neil H Watson