Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ruby Concurrency I/O

Following this, Ruby thread limit - Also for any language

I am trying to understand why my threads are not working. Some answer were pretty clear like:

"..creating 4 subprocessses with fork will utilize your 4 cores" which this will be my final approach since threads don't seem to work in my case.

also this:

"..Ruby MRI threading will not by itself fully utilise a multi-core CPU with running Ruby code. But whether that's a problem for you depends on what the threads are doing. If they are making long-running I/O calls to other processes on the same machine, you will see the benefit without needing separate processes. Threading and multi-processing as subjects can get quite complex doing even simple things. Most languages will make some compromises on what is easy and what is difficult out of the box..."

Taking into consideration the second one, I have removed any processing from my code and just left I/O in it.

Here it is:

beginning_time = Time.now
img_processor.load_image(frames_dir+"/frame_0001.png")
img_processor.load_image(frames_dir+"/frame_0002.png")
end_time = Time.now
puts "Time elapsed #{(end_time - beginning_time)*1000} milliseconds"

beginning_time = Time.now
for frame_index in 1..2
    greyscale_frames_threads << Thread.new(frame_index) { |frame_number| 
        puts "Loading Image #{frame_number}"
        img_processor.load_image(frames_dir+"/frame_%04d.png"%+frame_number)
    }
end

puts "Joining Threads"
greyscale_frames_threads.each { |thread| thread.join } #this blocks the main thread
end_time = Time.now
puts "Time elapsed #{(end_time - beginning_time)*1000} milliseconds"

And what I am getting is this...

For the first non-threaded case:

Time elapsed 15561.358 milliseconds

For the second threaded case:

Time elapsed 15442.401 milliseconds

Ok, where is the performance increase? Am I missing something? Is the HDD blocking? Do I really need to spawn processes to see real parallelism in ruby?

like image 666
Trt Trt Avatar asked Jun 19 '13 12:06

Trt Trt


1 Answers

Do I really need to spawn processes to see real parallelism in ruby?

Yes, I think so:

require 'timeout'
require 'digest'
require 'benchmark'

def do_stuff
  Digest::SHA256.new.digest "a" * 100_000_000
end

N = 10
Benchmark.bm(10) do |x|

  x.report("sequential") do
    N.times do
      do_stuff
    end
  end

  x.report("subprocess") do
    N.times do
      fork { do_stuff }
    end
    Process.waitall
  end

  x.report("thread") do
    threads = []
    N.times do
      threads << Thread.new { do_stuff }
    end
    threads.each(&:join)
  end

end

Results for MRI 2.0.0:

                 user     system      total        real
sequential   3.200000   0.180000   3.380000 (  3.383322)
subprocess   0.000000   0.000000   6.600000 (  1.068517)
thread       3.290000   0.210000   3.500000 (  3.496207)

The first block (sequential) runs do_stuff 4 times, one after another, the second block (subprocess) runs on 4 cores, whereas the third block (thread) runs on 1 core.


If you change do_stuff to:

def do_stuff
  sleep(1)
end

The result is different:

                 user     system      total        real
sequential   0.000000   0.000000   0.000000 ( 10.021893)
subprocess   0.000000   0.010000   0.080000 (  1.013693)
thread       0.000000   0.000000   0.000000 (  1.003463)
like image 157
Stefan Avatar answered Oct 19 '22 18:10

Stefan