Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to fix a deadlock in join() in Ruby

I am working in multi-threading in Ruby. The code snippet is:

  threads_array = Array.new(num_of_threads)  
  1.upto(num_of_threads) do |i|  

    Thread.abort_on_exception = true
      threads_array[i-1] =  Thread.new {
        catch(:exit) do
          print "s #{i}"
          user_id = nil
          loop do
            user_id = user_ids.pop()
            if user_id == nil
              print "a #{i}"
              Thread.stop()
            end
            dosomething(user_id)
          end
        end
      }
    end
    #puts "after thread"
    threads_array.each {|thread| thread.join}

I am not using any mutex locks, but I get a deadlock. This is the output of the above code snippet:

s 2s 6s 8s 1s 11s 7s 10s 14s 16s 21s 24s 5s 26s 3s 19s 20s 23s 4s 28s 9s 12s 18s 22s 29s 30s 27s 13s 17s 15s 25a 4a 10a 3a 6a 21a 24a 16a 9a 18a 5a 28a 20a 2a 22a 11a 29a 8a 14a 23a 26a 1a 19a 7a 12fatal: deadlock detected

The above output tells me that the deadlock is after the user_ids array is null and happening with Thread's join and stop.

What actually is happening and what is the solution to this error?

like image 684
sravan_kumar Avatar asked Jan 19 '12 11:01

sravan_kumar


3 Answers

The simplest code to reproduce this issue is:

t = Thread.new { Thread.stop }
t.join # => exception in `join': deadlock detected (fatal)

Thread::stop → nil

Stops execution of the current thread, putting it into a “sleep” state, and schedules execution of another thread.

Thread#join → thr
Thread#join(limit) → thr

The calling thread will suspend execution and run thr. Does not return until thr exits or until limit seconds have passed. If the time limit expires, nil will be returned, otherwise thr is returned.

As far as I understand you call Thread.join without parameters on thread and wait for it to exit, but the child thread calls Thread.stop and goes into sleep status. This is a deadlock situation, the main thread waits for the child thread to exit, but the child thread is sleeping and not responding.

If you call join with limit the parameter then the child thread will be aborted after a timeout without causing a deadlock to your program:

t = Thread.new { Thread.stop }
t.join 1 # => Process finished with exit code 0

I would recommend exiting your worker threads after they do the job with Thread.exit or get rid of the infinite loop and reach the end of the execution thread normally, for example:

if user_id == nil
  raise StopIteration
end

#or 
if user_id == nil
  Thread.exit
end
like image 50
Aliaksei Kliuchnikau Avatar answered Nov 05 '22 07:11

Aliaksei Kliuchnikau


In addition to Alex Kliuchnikau's answer, I'll add that #join could raise this error when thread is waiting for Queue#pop. A simple and conscious solution is call #join with a timeout.

This is from ruby 2.2.2:

[27] pry(main)> q=Queue.new
=> #<Thread::Queue:0x00000003a39848>
[30] pry(main)> q << "asdggg"
=> #<Thread::Queue:0x00000003a39848>
[31] pry(main)> q << "as"
=> #<Thread::Queue:0x00000003a39848>
[32] pry(main)> t = Thread.new {
[32] pry(main)*   while s = q.pop
[32] pry(main)*     puts s
[32] pry(main)*   end  
[32] pry(main)* }  
asdggg
as
=> #<Thread:0x00000003817ce0@(pry):34 sleep>
[33] pry(main)> q << "asg"
asg
=> #<Thread::Queue:0x00000003a39848>
[34] pry(main)> q << "ashg"
ashg
=> #<Thread::Queue:0x00000003a39848>
[35] pry(main)> t.join
fatal: No live threads left. Deadlock?
from (pry):41:in `join'
[36] pry(main)> t.join(5)
=> nil
like image 7
akostadinov Avatar answered Nov 05 '22 09:11

akostadinov


If I get your intentions right I would consider something simpler (and probably safer, users_ids.pop() from within thread looks scary to me):

user_ids = (0..19).to_a
number_of_threads = 3

user_ids \
  .each_slice(user_ids.length / number_of_threads + 1) \
  .map { |slice| 
      Thread.new(slice) { |s| 
        puts s.inspect 
      }
  }.map(&:join)
like image 1
Victor Moroz Avatar answered Nov 05 '22 09:11

Victor Moroz