I have a ruby script reading a huge table (~20m rows), doing some processing and feeding it over to Solr for indexing purposes. This has been a big bottleneck in our process. I am planning to speed things in here and I'd like to achieve some kind of parallelism. I am confused about Ruby's multithreading nature. Our servers have
ruby 1.8.7 (2009-06-12 patchlevel 174) [x86_64-linux]
. From this blog post and this question at StackOverflow it is visible that Ruby does not have a "real" multi threading approach. Our servers have multiple cores, so using parallel gem seems another approach to me.
What approach should I go with? Also, any inputs on parallel-database-read-feeding systems would be highly appreciated.
You can parallelize this at the OS level. Change the script so that it can take a range of lines from your input file
$ reader_script --lines=10000:20000 mytable.txt
Then execute multiple instances of the script.
$ reader_script --lines=0:10000 mytable.txt&
$ reader_script --lines=10000:20000 mytable.txt&
$ reader_script --lines=20000:30000 mytable.txt&
Unix will distribute them to different cores automatically.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With