My program, which I've run numerous times on different clusters suddenly stops. The log:
15/04/20 19:19:59 INFO scheduler.TaskSetManager: Finished task 12.0 in stage 15.0 (TID 374) in 61 ms on ip-XXX.compute.internal (16/24)
15/04/20 19:19:59 INFO storage.BlockManagerInfo: Added rdd_44_14 in memory on ip-XXX.compute.internal:37999 (size: 16.0 B, free: 260.6 MB)
Killed
What does "Killed" mean and why does it occur? There's no other errors.
"Killed" usually means that the OS has terminated the process by sending a SIGKILL signal. This is an unblockable signal that terminates a process immediately. It's often used as an OOM (out-of-memory) process killer -- if the OS decides that memory resources are getting dangerously low, it can pick a process to kill to try to free some memory.
Without more information, it's impossible to tell whether your process was killed because of memory problems or for some other reason. The kind of information you might be able to provide to help diagnose what's going on includes: how long was the process running before it was killed? can you enable and provide more verbose debug output from the process? is the process termination associated with any particular pattern of communication or processing activity?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With