Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use ruby fibers to avoid blocking IO

Tags:

ruby

ruby-1.9

I need to upload a bunch of files in a directory to S3. Since more than 90% of the time required to upload is spent waiting for the http request to finish, I want to execute several of them at once somehow.

Can Fibers help me with this at all? They are described as a way to solve this sort of problem, but I can't think of any way I can do any work while an http call blocks.

Any way I can solve this problem without threads?

like image 853
Sean Clark Hess Avatar asked Mar 03 '10 00:03

Sean Clark Hess


2 Answers

I'm not up on fibers in 1.9, but regular Threads from 1.8.6 can solve this problem. Try using a Queue http://ruby-doc.org/stdlib/libdoc/thread/rdoc/classes/Queue.html

Looking at the example in the documentation, your consumer is the part that does the upload. It 'consumes' a URL and a file, and uploads the data. The producer is the part of your program that keeps working and finds new files to upload.

If you want to upload multiple files at once, simply launch a new Thread for each file:

t = Thread.new do
  upload_file(param1, param2)
end
@all_threads << t

Then, later on in your 'producer' code (which, remember, doesn't have to be in its own Thread, it could be the main program):

@all_threads.each do |t|
  t.join if t.alive?
end

The Queue can either be a @member_variable or a $global.

like image 126
audiodude Avatar answered Nov 20 '22 03:11

audiodude


Aaron Patterson (@tenderlove) uses an example almost exactly like yours to describe exactly why you can and should use threads to achieve concurrency in your situation.

Most I/O libraries are now smart enough to release the GVL (Global VM Lock, or most people know it as the GIL or Global Interpreter Lock) when doing IO. There is a simple function call in C to do this. You don't need to worry about the C code, but for you this means that most IO libraries worth their salt are going to release the GVL and allow other threads to execute while the thread that is doing the IO waits for the data to return.

If what I just said was confusing, you don't need to worry about it too much. The main thing that you need to know is that if you are using a decent library to do your HTTP requests (or any other I/O operation for that matter... database, interprocess communication, whatever), the Ruby interpreter (MRI) is smart enough to be able to release the lock on the interpreter and allow other threads to execute while one thread awaits IO to return. If the next thread has its own IO to grab, the Ruby interpreter will do the same thing (assuming that the IO library is built to utilize this feature of Ruby, which I believe most are these days).

So, to sum up what I am saying, use threads! You should see the performance benefit. If not, check to see whether your http library is using the rb_thread_blocking_region() function in C and, if not, find out why not. Maybe there is a good reason, maybe you need to consider using a better library.

The link to the Aaron Patterson video is here: http://www.youtube.com/watch?v=kufXhNkm5WU

It is worth a watch, even if just for the laughs, as Aaron Patterson is one of the funniest people on the internet.

like image 3
Joe Edgar Avatar answered Nov 20 '22 03:11

Joe Edgar