Update Again
I have tried to create some simple way to reproduce this, but have not been successful.
So far, I have tried various simple array allocations and manipulations, but they all throw an MemoryError rather than just SIGKILL crashing.
For example:
x =np.asarray(range(999999999))
or:
x = np.empty([100,100,100,100,7])
just throw MemoryErrors as they should.
I hope to have a simple way to recreate this at some point.
End Update
I have a python script running numpy/scipy and some custom C extensions.
On my Ubuntu 14.04 under Virtual Box, it runs to completion just fine.
On an Amazon EC2 T2 micro instance, it terminates (after running a while) with the output:
Killed
Running under the python debugger, the signal is not caught and the debugger exits as well.
Running under strace, I get:
munmap(0x7fa5b7fa6000, 67112960) = 0
mmap(NULL, 67112960, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa5b7fa6000
mmap(NULL, 67112960, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa5affa4000
mmap(NULL, 67112960, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa5abfa3000
mmap(NULL, 67637248, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa5a7f22000
mmap(NULL, 67637248, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa5a3ea1000
mmap(NULL, 67637248, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa59fe20000
gettimeofday({1406518336, 306209}, NULL) = 0
gettimeofday({1406518336, 580022}, NULL) = 0
+++ killed by SIGKILL +++
running under gdb while trying to catch "SIGKILL", I get:
[Thread 0x7fffe7148700 (LWP 28022) exited]
Program terminated with signal SIGKILL, Killed.
The program no longer exists.
(gdb) where
No stack.
running python's trace module (python -m trace --trace ), I get:
defmatrix.py(292): if (isinstance(obj, matrix) and obj._getitem): return
defmatrix.py(293): ndim = self.ndim
defmatrix.py(294): if (ndim == 2):
defmatrix.py(295): return
defmatrix.py(336): return out
--- modulename: linalg, funcname: norm
linalg.py(2052): x = asarray(x)
--- modulename: numeric, funcname: asarray
numeric.py(460): return array(a, dtype, copy=False, order=order)
I can't think of anything else at the moment to figure out what is going on.
I suspect maybe it might be running out of memory (it is an AWS Micro instance), but I can't figure out how to confirm or deny that.
Is there another tool I could use that might help pinpoint exactly where the program is stopping? (or I am running one of the above tools the wrong way for this problem?)
The Amazon EC2 T2 micro instance has no swap space defined by default, so I added a 4GB swap file and was able to run the program to completion.
However, I am still very interested in a way to have run the program such that it terminated with some message a little closer to "Not Enough Memory" rather than "Killed"
If anyone has any suggestions, they would be appreciated.
SIGKILL is where the Python process is terminated by your system. Reasons I have seen this: Low resources (not enough RAM, usually) - monitor and see how much the program is using. You might also want to try explicitly setting n_jobs to a low number, as CPU over-subscription could be an issue.
The most likely is that your program was using too much memory. Rather than risking things breaking when memory allocations started failing, the system sent a kill signal to the process that was using too much memory.
In your example, you have to look for parts of your algorithm that could be consuming a lot of memory. If an operation runs out of memory it is known as memory error. If you get an unexpected Python Memory Error and you think you should have plenty of rams available, it might be because you are using a 32-bit python installation.
What is Memory Error? Python Memory Error or in layman language is exactly what it means, you have run out of memory in your RAM for your code to execute. When this error occurs it is likely because you have loaded the entire data into memory. For large datasets, you will want to use batch processing.
Why does Python automatically exit a script when it’s done? The way Python executes a code block makes it execute each line in order, checking dependencies to import, reading definitions and classes to store in memory, and executing pieces of code in order allowing for loops and calls back to the defined definitions and classes.
When Python reaches the EOF condition at the same time that it has executed all the code without throwing any exceptions, which is one way Python may exit “gracefully.” If we want to tell when a Python program exits without throwing an exception, we can use the built-in Python atexit module.
It sounds like you've run into the dreaded Linux OOM Killer. When the system completely runs of out of memory and the kernel absolutely needs to allocate memory, it kills a process rather than crashing the entire system.
Look in the syslog for confirmation of this. A line similar to:
kernel: [884145.344240] mysqld invoked oom-killer:
followed sometime later with:
kernel: [884145.344399] Out of memory: Kill process 3318
Should be present (in this example, it mentions mysql specifically)
You can add these lines to your /etc/sysctl.conf
file to effectively disable the OOM killer:
vm.overcommit_memory = 2
vm.overcommit_ratio = 100
And then reboot. Now, the original, memory hungry, process should fail to allocate memory and, hopefully, throw the proper exception.
Setting overcommit_memory
means that Linux won't over commit memory, meaning memory allocations will fail if there isn't enough memory for them. See this answer for details on what effect the overcommit_ratio
has: https://serverfault.com/a/510857
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With