<h3>TLDR: Is there a thread-safe version of the Enumerator class in Ruby?</h3> <hr> <h3>What I'm trying to do:</h3> <p>I have a method in a Ruby On Rails application that I wanted to run concurrently. The method is supposed to create a zip file containing reports from the site, where each file in the zip is a PDF. The conversion from html to PDF is somewhat slow, thus the desire to multi-thread.</p> <h3>How I expected to do it:</h3> <p>I wanted to use 5 threads, so I figured I would have a shared Enumerator between the threads. Each thread would pop a value from the Enumerator, and run do stuff to it. Here's how I was thinking it would work:</p> <pre class="prettyprint lang-ruby prettyprint-override"><code>t = Zip::OutputStream::write_buffer do |z| mutex = Mutex.new gen = Enumerator.new{ |g| Report.all.includes("employee" => ["boss", "client"], "projects" => {"project_owner" => ["project_team"]}).find_each do |report| g.yield report end } 5.times.map { Thread.new do begin loop do mutex.synchronize do @report = gen.next end title = @report.title + "_" + @report.id.to_s title += ".pdf" unless title.end_with?(".pdf") pdf = PDFKit.new(render_to_string(:template => partial_url, locals: {array: [@report]}, :layout => false)).to_pdf mutex.synchronize do z.put_next_entry(title) z.write(pdf) end end rescue StopIteration # do nothing end end }.each {|thread| thread.join } end </code></pre> <h3>What happened when I tried it:</h3> <p>When I ran the above code, I got the following error:</p> <pre class="prettyprint"><code>FiberError at /generate_report fiber called across threads </code></pre> <p>After some searching, I came across this post, which recommended that I use a Queue instead of an Enumerator, because Queues are thread safe, while Enumerators are not. While this might be reasonable for non-Rails applications, this is impractical for me.</p> <h3>Why I can't just use a Queue:</h3> <p>The nice thing about Rails 4 ActiveRecord is that it doesn't load queries until they are iterated over. And, if you use a method like <code>find_each</code> to iterate over it, it does it in batches of 1000, so you never have to store an entire table in ram all at once. The results from query I'm using: <code>Report.all.includes("employee" => ["boss", "client"], "projects" => {"project_owner" => ["project_team"]})</code> is large. Very large. And I need to be able to load it on the fly, rather than doing something like:</p> <pre class="prettyprint lang-ruby prettyprint-override"><code>gen = Report.all.includes("employee" => ["boss", "client"], "projects" => {"project_owner" => ["project_team"]}).map(&queue.method(:push)) </code></pre> <p>Which will load that entire query into ram.</p> <h3>Finally the question:</h3> <p>Is there a thread-safe way of doing this:</p> <pre class="prettyprint lang-ruby prettyprint-override"><code>gen = Enumerator.new{ |g| Report.all.includes(...).find_each do |report| g.yield report end } </code></pre> <p>So that I can pop data from <code>gen</code> across multiple threads, without having to load my entire <code>Report</code> (and all of the includes) table into ram?</p>

<p>If you start the worker threads before filling up the queue, they will start consuming the queue as you fill it up, and because as a rule of thumb - network is slower than CPU, each batch should be (mostly) consumed by the time the next batch arrives:</p> <pre class="prettyprint"><code>queue = Queue.new t1 = Thread.new do while !queue.empty? p queue.pop(true) sleep(0.1) end end t2 = Thread.new do while !queue.empty? p queue.pop(true) sleep(0.1) end end (0..1000).map(&queue.method(:push)) t1.join t2.join </code></pre> <p>If that proves too slow still, you can opt to use <code>SizedQueue</code>, which will block the <code>push</code> if the queue reaches a big enough size:</p> <pre class="prettyprint"><code>queue = SizedQueue.new(100) t1 = Thread.new do while !queue.empty? p "#{queue.pop(true)} - #{queue.size}" sleep(0.1) end end t2 = Thread.new do while !queue.empty? p queue.pop(true) sleep(0.1) end end (0..300).map(&queue.method(:push)) t1.join t2.join </code></pre>

Thread safe Enumerator in Ruby

TLDR: Is there a thread-safe version of the Enumerator class in Ruby?

What I'm trying to do:

I have a method in a Ruby On Rails application that I wanted to run concurrently. The method is supposed to create a zip file containing reports from the site, where each file in the zip is a PDF. The conversion from html to PDF is somewhat slow, thus the desire to multi-thread.

How I expected to do it:

I wanted to use 5 threads, so I figured I would have a shared Enumerator between the threads. Each thread would pop a value from the Enumerator, and run do stuff to it. Here's how I was thinking it would work:

Click to copy

t = Zip::OutputStream::write_buffer do |z|
  mutex = Mutex.new
  gen = Enumerator.new{ |g|
    Report.all.includes("employee" => ["boss", "client"], "projects" => {"project_owner" => ["project_team"]}).find_each do |report|
      g.yield report
    end
  }
  5.times.map {
    Thread.new do
      begin
        loop do
          mutex.synchronize  do
            @report = gen.next
          end
          title = @report.title + "_" + @report.id.to_s
          title += ".pdf" unless title.end_with?(".pdf")
          pdf = PDFKit.new(render_to_string(:template => partial_url, locals: {array: [@report]},
                                            :layout => false)).to_pdf
          mutex.synchronize  do
            z.put_next_entry(title)
            z.write(pdf)
          end
        end
      rescue StopIteration
        # do nothing
      end
    end
  }.each {|thread| thread.join }
end

What happened when I tried it:

When I ran the above code, I got the following error:

Click to copy

FiberError at /generate_report
fiber called across threads

After some searching, I came across this post, which recommended that I use a Queue instead of an Enumerator, because Queues are thread safe, while Enumerators are not. While this might be reasonable for non-Rails applications, this is impractical for me.

Why I can't just use a Queue:

The nice thing about Rails 4 ActiveRecord is that it doesn't load queries until they are iterated over. And, if you use a method like find_each to iterate over it, it does it in batches of 1000, so you never have to store an entire table in ram all at once. The results from query I'm using: Report.all.includes("employee" => ["boss", "client"], "projects" => {"project_owner" => ["project_team"]}) is large. Very large. And I need to be able to load it on the fly, rather than doing something like:

Click to copy

gen = Report.all.includes("employee" => ["boss", "client"], "projects" => {"project_owner" => ["project_team"]}).map(&queue.method(:push))

Which will load that entire query into ram.

Finally the question:

Is there a thread-safe way of doing this:

Click to copy

gen = Enumerator.new{ |g|
        Report.all.includes(...).find_each do |report|
          g.yield report
        end
}

So that I can pop data from gen across multiple threads, without having to load my entire Report (and all of the includes) table into ram?

410

asked Sep 11 '15 00:09

Ephraim

1 Answers

If you start the worker threads before filling up the queue, they will start consuming the queue as you fill it up, and because as a rule of thumb - network is slower than CPU, each batch should be (mostly) consumed by the time the next batch arrives:

Click to copy

queue = Queue.new

t1 = Thread.new do
  while !queue.empty?
    p queue.pop(true)
    sleep(0.1)
  end
end
t2 = Thread.new do
  while !queue.empty?
    p queue.pop(true)
    sleep(0.1)
  end
end

(0..1000).map(&queue.method(:push))

t1.join
t2.join

If that proves too slow still, you can opt to use SizedQueue, which will block the push if the queue reaches a big enough size:

Click to copy

queue = SizedQueue.new(100)

t1 = Thread.new do
  while !queue.empty?
    p "#{queue.pop(true)} - #{queue.size}"
    sleep(0.1)
  end
end
t2 = Thread.new do
  while !queue.empty?
    p queue.pop(true)
    sleep(0.1)
  end
end
(0..300).map(&queue.method(:push))
t1.join
t2.join

answered Oct 06 '22 00:10

Uri Agassi

Related questions
                            
                                Zeus throwing errors when running RSpec
                            
                                Combine Octopress and a Rails 4.0 Application?
                            
                                Using OrientDB's JDBC driver with ActiveRecord
                            
                                How to Test Pusher with RSpec
                            
                                Integration of WYSIWYG editor to best-in-place textarea
                            
                                How to convert Mysql encoding utf8 to utf8mb4 in Rails project
                            
                                Gem libxml-ruby (1.1.4) installs but fails on runtime
                            
                                Rails 4 Session across subdomains
                            
                                How to understand and reduce IP spoofing attack errors in a Rails application?
                            
                                Google Docs inline pdf shows up as black and white
                            
                                Rails 3.2.13 vs Rails 4.0.1 - changed? method changed?
                            
                                ActionController::InvalidAuthenticityToken and domain names
                            
                                How to log API requests in Rails?
                            
                                Validating That A has_many Association Has At Least One Model When Using FactoryGirl
                            
                                In Rails, how can I eager load all code before a specific Rspec test?
                            
                                heroku error with rails app: Error with build stream, polling for results
                            
                                How to tag every log call in rails with request id (lograge)
                            
                                Single sign-on, multiple domains on same server, ruby on rails
                            
                                Rails 4 gmaps4rails - How to include a link to a "show" view in marker infowindow in gmaps4rails gem
                            
                                Grunticon & TravisCI

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Thread safe Enumerator in Ruby

Tags:

multithreading

ruby

ruby-on-rails

enumerator

lazy-loading

TLDR: Is there a thread-safe version of the Enumerator class in Ruby?

What I'm trying to do:

How I expected to do it:

What happened when I tried it:

Why I can't just use a Queue:

Finally the question:

Ephraim

People also ask

1 Answers

Uri Agassi

Recent Activity

Donate For Us