How can a process die in a way that Process.wait wouldn't notice?

Tags:

I have this ruby script to manage que processes. que doesn't support multi-proccess, see discussion here):

#!/usr/bin/env ruby

cluster_size = 2    
puts "starting Que cluster with #{cluster_size} workers"; STDOUT.flush

%w[INT TERM].each do |signal|
  trap(signal) do
    @pids.each{|pid| Process.kill(signal, pid) }
  end
end

@pids = []
cluster_size.to_i.times do |n|
  puts "Starting Que daemon #{n}"; STDOUT.flush
  @pids << Process.spawn("que --worker-count $MAX_THREADS")
end

Process.waitall

puts "Que cluster has shut down"; STDOUT.flush

The script has been working well for a couple months. The other day I found things in a state where the script was running, but both child processes were dead.

I experimented with trying to replicate this. I killed the children with various signals, had them raise exceptions. In all cases, the script knew the process died and itself died.

How could the child process have died without the parent script knowing?

243

asked Sep 27 '18 21:09

John Bachir

2 Answers

How could the child process have died without the parent script knowing?

My guess is that the child process turned into a zombie and missed by Process.waitall. Did you check if the child processes are zombies when it happens?

The zombie: If you have zombie processes it means those zombies have not been waited for by their parent (check the PPID with ps -l). In the end you have three choices: Fix the parent process (make it wait); kill the parent; or get over it.

Could you check your list of signals and trap it?

You can list all Signal(s) available (below is on windows):

Signal.list
=> {"EXIT"=>0, "INT"=>2, "ILL"=>4, "ABRT"=>22, "FPE"=>8, "KILL"=>9, "SEGV"=>11, "TERM"=>15}

Could you try to trap it via e.g. INT (note: you can have one trap per Signal) (

Signal.trap('SEGV') { throw :sigsegv }

catch :sigsegv
    start_what_you_need
end
puts 'OMG! Got a SEGV!'

Since your question is a general one, it is hard to give you a specific answer.

124

answered Sep 28 '22 17:09

tukan

Zombies are not the only possible cause for this problem -- stopped children may not be reported for a variety of reasons.

The existence of a zombie typically means that the parent has not properly waited on them. The posted code looks OK, though, so unless there's a framework bug lurking somewhere I'd want to look beyond the zombie apocalypse to explain this problem.

In contrast to zombies, which can't be fully reaped because they have no accessible parent, frozen processes have an intact parent but have stopped responding for some reason (waiting for an external process or I/O operation, memory problems, long or infinite looping, slow database operations, etc.).

On some platforms, Ruby can add a flag requesting return of stopped children that haven't been reported, using the following syntax:

waitpid(pid, Process::WUNTRACED)

AFAIK waitall doesn't have a version that accepts flags, so you'd have to aggregate this yourself, or use pid = -1 to wait for any child process (the default if you omit pid) or pid = 0 to wait for any child with the same process groupID as the calling process.

See documentation here.

answered Sep 28 '22 17:09

Craig.Feied

Related questions
                            
                                HTTP streaming connection (SSE) client disconnect not detected with Sinatra/Thin on Heroku
                            
                                How to use Arel::Nodes::TableAlias in an initial where statement
                            
                                How can I retrieve deleted objects from Active Directory with Ruby?
                            
                                ActiveRecord migration not populating a Postgres materialized view
                            
                                Ruby Bundler Authentication Error
                            
                                Rspec Testing of real time results through mTurk
                            
                                Is there a ruby equivalent to the Scala Option?
                            
                                Ruby Parallel each loop
                            
                                How to correctly setup a database.yml file in Rails 4
                            
                                Rails/Ruby: TimeWithZone comparison inexplicably failing for equivalent values
                            
                                Ruby protected visibility calling from superclass
                            
                                How can I delete a file in Sinatra after it has been sent via send_file?
                            
                                Rails 3 best way to create a comment system for posts
                            
                                Is it possible to directly install a gem from a git repository?
                            
                                Consequences of Ruby's fiber 4kB stack size
                            
                                Calling Node.js script from Rails app using ExecJS
                            
                                heroku undefined method empty? when upgrading my app to ruby 2.0
                            
                                Wrong number of arguments when run compass watch
                            
                                Why does activerecord not populate an auto-incrementing column in the item returned from a create?
                            
                                Active_Shipping Negotiated Rates for UPS - Ruby on Rails

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can a process die in a way that Process.wait wouldn't notice?

Tags:

process

ruby

signals

John Bachir

People also ask

2 Answers

tukan

Craig.Feied

Recent Activity

Donate For Us