Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can a process die in a way that Process.wait wouldn't notice?

I have this ruby script to manage que processes. que doesn't support multi-proccess, see discussion here):

#!/usr/bin/env ruby

cluster_size = 2    
puts "starting Que cluster with #{cluster_size} workers"; STDOUT.flush

%w[INT TERM].each do |signal|
  trap(signal) do
    @pids.each{|pid| Process.kill(signal, pid) }
  end
end

@pids = []
cluster_size.to_i.times do |n|
  puts "Starting Que daemon #{n}"; STDOUT.flush
  @pids << Process.spawn("que --worker-count $MAX_THREADS")
end

Process.waitall

puts "Que cluster has shut down"; STDOUT.flush

The script has been working well for a couple months. The other day I found things in a state where the script was running, but both child processes were dead.

I experimented with trying to replicate this. I killed the children with various signals, had them raise exceptions. In all cases, the script knew the process died and itself died.

How could the child process have died without the parent script knowing?

like image 243
John Bachir Avatar asked Sep 27 '18 21:09

John Bachir


People also ask

Can a process kill itself?

The process is killed before printing message on the console and therefore the answer is YES, a process can kill itself.

When a parent process dies what happens to any child processes that are still running?

When a parent process dies before a child process, the kernel knows that it's not going to get a wait call, so instead it makes these processes "orphans" and puts them under the care of init (remember mother of all processes).

Can a child process kill parent process?

Upon receiving the signal, the child's normal flow of execution is interrupted to run its handler, function2() . This updates the child's copy of variable counter , prints its value, and exit() s. then exits. So you mean even the function kill cannot kill the parent successfully.

How do you wait for a signal in the child process?

A call to wait() blocks the calling process until one of its child processes exits or a signal is received. After child process terminates, parent continues its execution after wait system call instruction. Child process may terminate due to any of these: It calls exit();


2 Answers

How could the child process have died without the parent script knowing?

My guess is that the child process turned into a zombie and missed by Process.waitall. Did you check if the child processes are zombies when it happens?

The zombie: If you have zombie processes it means those zombies have not been waited for by their parent (check the PPID with ps -l). In the end you have three choices: Fix the parent process (make it wait); kill the parent; or get over it.

Could you check your list of signals and trap it?

You can list all Signal(s) available (below is on windows):

Signal.list
=> {"EXIT"=>0, "INT"=>2, "ILL"=>4, "ABRT"=>22, "FPE"=>8, "KILL"=>9, "SEGV"=>11, "TERM"=>15}

Could you try to trap it via e.g. INT (note: you can have one trap per Signal) (

Signal.trap('SEGV') { throw :sigsegv }

catch :sigsegv
    start_what_you_need
end
puts 'OMG! Got a SEGV!'

Since your question is a general one, it is hard to give you a specific answer.

like image 124
tukan Avatar answered Sep 28 '22 17:09

tukan


Zombies are not the only possible cause for this problem -- stopped children may not be reported for a variety of reasons.

The existence of a zombie typically means that the parent has not properly waited on them. The posted code looks OK, though, so unless there's a framework bug lurking somewhere I'd want to look beyond the zombie apocalypse to explain this problem.

In contrast to zombies, which can't be fully reaped because they have no accessible parent, frozen processes have an intact parent but have stopped responding for some reason (waiting for an external process or I/O operation, memory problems, long or infinite looping, slow database operations, etc.).

On some platforms, Ruby can add a flag requesting return of stopped children that haven't been reported, using the following syntax:

waitpid(pid, Process::WUNTRACED)

AFAIK waitall doesn't have a version that accepts flags, so you'd have to aggregate this yourself, or use pid = -1 to wait for any child process (the default if you omit pid) or pid = 0 to wait for any child with the same process groupID as the calling process.

See documentation here.

like image 21
Craig.Feied Avatar answered Sep 28 '22 17:09

Craig.Feied