I was looking into ruby's parallel/asynchronous processing capabilities and read many articles and blog posts. I looked through EventMachine, Fibers, Revactor, Reia, etc, etc. Unfortunately, I wasn't able to find a simple, effective (and non-IO-blocking) solution for this very simple use case:
File.open('somelogfile.txt') do |file|
while line = file.gets # (R) Read from IO
line = process_line(line) # (P) Process the line
write_to_db(line) # (W) Write the output to some IO (DB or file)
end
end
Is you can see, my little script is performing three operations read (R), process (P) & write (W). Let's assume - for simplicity - that each operation takes exactly 1 unit of time (e.g. 10ms), the current code would therefore do something like this (5 lines):
Time: 123456789012345 (15 units in total)
Operations: RPWRPWRPWRPWRPW
But, I would like it to do something like this:
Time: 1234567 (7 units in total)
Operations: RRRRR
PPPPP
WWWWW
Obviously, I could run three processes (reader, processor & writer) and pass read lines from reader into the processor queue and then pass processed lines into the writer queue (all coordinated via e.g. RabbitMQ). But, the use-case is so simple, it just doesn't feel right.
Any clues on how this could be done (without switching from Ruby to Erlang, Closure or Scala)?
If you need it to be truly parallel (from a single process) I believe you'll have to use JRuby to get true native threads and no GIL.
You could use something like DRb to distribute the processing across multiple processes / cores, but for your use case this is a bit much. Instead, you could try having multiple processes communicate using pipes:
$ cat somelogfile.txt | ruby ./proc-process | ruby ./proc-store
In this scenario each piece is its own process that can run in parallel but are communicating using STDIN / STDOUT. This is probably the easiest (and quickest) approach to your problem.
# proc-process
while line = $stdin.gets do
# do cpu intensive stuff here
$stdout.puts "data to be stored in DB"
$stdout.flush # this is important
end
# proc-store
while line = $stdin.gets do
write_to_db(line)
end
Check out peach (http://peach.rubyforge.org/). Doing a parallel "each" couldn't be simpler. However, as the documentation says, you'll need to run under JRuby in order to use the JVM's native threading.
See Jorg Mittag's response to this SO question for a lot of detail on the multithreading capabilities of the various Ruby interpreters.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With