Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ruby concurrency/asynchronous processing (with simple use case)

I was looking into ruby's parallel/asynchronous processing capabilities and read many articles and blog posts. I looked through EventMachine, Fibers, Revactor, Reia, etc, etc. Unfortunately, I wasn't able to find a simple, effective (and non-IO-blocking) solution for this very simple use case:

File.open('somelogfile.txt') do |file|
  while line = file.gets      # (R) Read from IO
    line = process_line(line) # (P) Process the line
    write_to_db(line)         # (W) Write the output to some IO (DB or file)
  end
end

Is you can see, my little script is performing three operations read (R), process (P) & write (W). Let's assume - for simplicity - that each operation takes exactly 1 unit of time (e.g. 10ms), the current code would therefore do something like this (5 lines):

Time:       123456789012345 (15 units in total)
Operations: RPWRPWRPWRPWRPW

But, I would like it to do something like this:

Time:       1234567 (7 units in total)
Operations: RRRRR
             PPPPP
              WWWWW

Obviously, I could run three processes (reader, processor & writer) and pass read lines from reader into the processor queue and then pass processed lines into the writer queue (all coordinated via e.g. RabbitMQ). But, the use-case is so simple, it just doesn't feel right.

Any clues on how this could be done (without switching from Ruby to Erlang, Closure or Scala)?

like image 251
Dim Avatar asked Oct 25 '10 12:10

Dim


2 Answers

If you need it to be truly parallel (from a single process) I believe you'll have to use JRuby to get true native threads and no GIL.

You could use something like DRb to distribute the processing across multiple processes / cores, but for your use case this is a bit much. Instead, you could try having multiple processes communicate using pipes:

$ cat somelogfile.txt | ruby ./proc-process | ruby ./proc-store

In this scenario each piece is its own process that can run in parallel but are communicating using STDIN / STDOUT. This is probably the easiest (and quickest) approach to your problem.

# proc-process
while line = $stdin.gets do
  # do cpu intensive stuff here
  $stdout.puts "data to be stored in DB"
  $stdout.flush # this is important
end

# proc-store
while line = $stdin.gets do
  write_to_db(line)
end
like image 125
JEH Avatar answered Nov 12 '22 18:11

JEH


Check out peach (http://peach.rubyforge.org/). Doing a parallel "each" couldn't be simpler. However, as the documentation says, you'll need to run under JRuby in order to use the JVM's native threading.

See Jorg Mittag's response to this SO question for a lot of detail on the multithreading capabilities of the various Ruby interpreters.

like image 39
Mark Thomas Avatar answered Nov 12 '22 19:11

Mark Thomas