I've been looking at optimizing a ruby program that's quite calculation intensive on a lot of data. I don't know C and have chosen Ruby (not that I know it well either) and I'm quite happy with the results, apart from the time it takes to execute. It is a lot of data, and without spending any money, I'd like to know what I can do to make sure I'm maximizing my own systems resources.
When I run a basic Ruby program, does it use a single processor? If I have not specifically assigned tasks to a processor, Ruby won't read my program and magically load each processor to complete the program as fast as possible will it? I'm assuming no...
I've been reading a bit on speeding up Ruby, and in another thread read that Ruby does not support true multithreading (though it said JRuby does). But, if I were to "break up" my program into two chunks that can be run in separate instances and run these in parralel...would these two chunks run on two separate processors automatically? If I had four processors and opened up four shells and ran four separate parts (1/4) of the program - would it complete in 1/4 the time?
After reading the comments I decided to give JRuby a shot. Porting the app over wasn't that difficult. I haven't used "peach" yet, but just by running it in JRuby, the app runs in 1/4 the time!!! Insane. I didn't expect that much of a change. Going to give .peach a shot now and see how that improves things. Still can't believe that boost.
Just gave peach a try. Ended up shaving another 15% off the time. So switching to JRuby and using Peach was definitely worth it.
Thanks everyone!
Use JRuby and the peach gem, and it couldn't be easier. Just replace an .each
with .peach
and voila, you're executing in parallel. And there are additional options to control exactly how many threads are spawned, etc. I have used this and it works great.
You get close to n times speedup, where n is the number of CPUs/cores available. I find that the optimal number of threads is slightly more than the number of CPUs/cores.
Like others have said the MRI implementation of ruby (the one most people use) does not support native threads. Hence you can not split work between CPU cores by launching more threads using the MRI implementation.
However if your process is IO-bound (restricted by disk or network activity for example), then you may still benefit from multiple MRI-threads.
JRuby on the other hand does support native threads, meaning you can use threads to split work between CPU cores.
But all is not lost. With MRI (and all the other ruby implementations), you can still use processes to split work.
This can be done using Process.fork
for example like this:
Process.fork {
10.times {
# Do some work in process 1
sleep 1
puts "Hello 1"
}
}
Process.fork {
10.times {
# Do some work in process 2
sleep 1
puts "Hello 2"
}
}
# Wait for the child processes to finish
Process.wait
Using fork
will split the processing between CPU cores, so if you can live without threads then separate processes are one way to do it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With