Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to trace a deadlock in Ruby

I use BrB to share a datasource for various worker processes in Ruby 1.9 that I fork with Process#fork:

Thread.abort_on_exception = true

fork do
  puts "Initializing data source process... (PID: #{Process.pid})"
  data = DataSource.new(files)

  BrB::Service.start_service(:object => data, :verbose => false, :host => host, :port => port)
  EM.reactor_thread.join
end

The workers are forked as follows:

8.times do |t|  
  fork do
    data = BrB::Tunnel.create(nil, "brb://#{host}:#{port}", :verbose => false)

    puts "Launching #{threads_num} worker threads... (PID: #{Process.pid})"    

    threads = []
    threads_num.times { |i|
      threads << Thread.new {
        while true
          begin
            worker = Worker.new(data, config)

          rescue OutOfTargetsError
            break

          rescue Exception => e
            puts "An unexpected exception was caught: #{e.class} => #{e}"
            sleep 5

          end
        end
      }
    }
    threads.each { |t| t.join }

    data.stop_service
    EM.stop
  end
end

This works pretty much perfectly, but after around 10 minutes of running I get the following error:

bootstrap.rb:47:in `join': deadlock detected (fatal)
    from bootstrap.rb:47:in `block in <main>'
    from bootstrap.rb:39:in `fork'
    from bootstrap.rb:39:in `<main>'</pre>

This error doesn't tell me much about where the deadlock is actually happening, it only points me to the join on the EventMachine thread.

How do I trace back at which point the program locks up?

like image 305
Patrick Glandien Avatar asked Jun 30 '10 20:06

Patrick Glandien


2 Answers

It's locking up on join in the parent thread, that information is accurate. To trace where it's locking up in the child thread, try wrapping the thread's work in a timeout block. You'll need to temporarily remove the catch-all rescue for the timeout exception to raise.

Currently the parent thread tries to join all threads in order, blocking until each until it's finished. However each thread will only join on an OutOfTargetsError. The deadlock might be avoided by using short-lived threads and moving the while loop into the parent. There are no guarantees, but maybe something like this will work?

8.times do |t|  
  fork do
    running = true
    Signal.trap("INT") do
      puts "Interrupt signal received, waiting for threads to finish..."
      running = false
    end

    data = BrB::Tunnel.create(nil, "brb://#{host}:#{port}", :verbose => false)

    puts "Launching max #{threads_num} worker threads... (PID: #{Process.pid})"    

    threads = []
    while running
      # Start new threads until we have threads_num running
      until threads.length >= threads_num do
        threads << Thread.new {
          begin
            worker = Worker.new(data, config)
          rescue OutOfTargetsError
          rescue Exception => e
            puts "An unexpected exception was caught: #{e.class} => #{e}"
            sleep 5
          end
        }
      end

      # Make sure the parent process doesn't spin too much
      sleep 1

      # Join finished threads
      finished_threads = threads.reject &:status
      threads -= finished_threads
      finished_threads.each &:join
    end

    data.stop_service
    EM.stop
  end
end
like image 122
captainpete Avatar answered Nov 05 '22 10:11

captainpete


I had the same problem and solved it by using this code snippet:

# Wait for all threads (other than the current thread and
# main thread) to stop running.
# Assumes that no new threads are started while waiting
def join_all
  main     = Thread.main       # The main thread
  current  = Thread.current    # The current thread
  all      = Thread.list       # All threads still running
  # Now call join on each thread
  all.each{|t| t.join unless t == current or t == main }
end

Source: The Ruby Programming Language, O'Reilly (2008)

like image 27
HaNdTriX Avatar answered Nov 05 '22 08:11

HaNdTriX