I use BrB to share a datasource for various worker processes in Ruby 1.9 that I fork with Process#fork
:
Thread.abort_on_exception = true
fork do
puts "Initializing data source process... (PID: #{Process.pid})"
data = DataSource.new(files)
BrB::Service.start_service(:object => data, :verbose => false, :host => host, :port => port)
EM.reactor_thread.join
end
The workers are forked as follows:
8.times do |t|
fork do
data = BrB::Tunnel.create(nil, "brb://#{host}:#{port}", :verbose => false)
puts "Launching #{threads_num} worker threads... (PID: #{Process.pid})"
threads = []
threads_num.times { |i|
threads << Thread.new {
while true
begin
worker = Worker.new(data, config)
rescue OutOfTargetsError
break
rescue Exception => e
puts "An unexpected exception was caught: #{e.class} => #{e}"
sleep 5
end
end
}
}
threads.each { |t| t.join }
data.stop_service
EM.stop
end
end
This works pretty much perfectly, but after around 10 minutes of running I get the following error:
bootstrap.rb:47:in `join': deadlock detected (fatal)
from bootstrap.rb:47:in `block in <main>'
from bootstrap.rb:39:in `fork'
from bootstrap.rb:39:in `<main>'</pre>
This error doesn't tell me much about where the deadlock is actually happening, it only points me to the join
on the EventMachine thread.
How do I trace back at which point the program locks up?
It's locking up on join
in the parent thread, that information is accurate.
To trace where it's locking up in the child thread, try wrapping the thread's work in a timeout
block. You'll need to temporarily remove the catch-all rescue
for the timeout exception to raise.
Currently the parent thread tries to join all threads in order, blocking until each until it's finished. However each thread will only join on an OutOfTargetsError
. The deadlock might be avoided by using short-lived threads and moving the while
loop into the parent. There are no guarantees, but maybe something like this will work?
8.times do |t|
fork do
running = true
Signal.trap("INT") do
puts "Interrupt signal received, waiting for threads to finish..."
running = false
end
data = BrB::Tunnel.create(nil, "brb://#{host}:#{port}", :verbose => false)
puts "Launching max #{threads_num} worker threads... (PID: #{Process.pid})"
threads = []
while running
# Start new threads until we have threads_num running
until threads.length >= threads_num do
threads << Thread.new {
begin
worker = Worker.new(data, config)
rescue OutOfTargetsError
rescue Exception => e
puts "An unexpected exception was caught: #{e.class} => #{e}"
sleep 5
end
}
end
# Make sure the parent process doesn't spin too much
sleep 1
# Join finished threads
finished_threads = threads.reject &:status
threads -= finished_threads
finished_threads.each &:join
end
data.stop_service
EM.stop
end
end
I had the same problem and solved it by using this code snippet:
# Wait for all threads (other than the current thread and
# main thread) to stop running.
# Assumes that no new threads are started while waiting
def join_all
main = Thread.main # The main thread
current = Thread.current # The current thread
all = Thread.list # All threads still running
# Now call join on each thread
all.each{|t| t.join unless t == current or t == main }
end
Source: The Ruby Programming Language, O'Reilly (2008)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With