Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parallelizing methods in Rails

My Rails web app has dozens of methods from making calls to an API and processing query result. These methods have the following structure:

def method_one
  batch_query_API
  process_data
end
..........
def method_nth
  batch_query_API
  process_data
end

def summary
  method_one
  ......
  method_nth
  collect_results
end

How can I run all query methods at the same time instead of sequential in Rails (without firing up multiple workers, of course)?

Edit: all of the methods are called from a single instance variable. I think this limits the use of Sidekiq or Delay in submitting jobs simultaneously.

like image 849
AdamNYC Avatar asked May 14 '13 19:05

AdamNYC


3 Answers

Ruby has the excellent promise gem. Your example would look like:

require 'future'

def method_one
...
def method_nth

def summary
  result1 = future { method_one }
  ......
  resultn = future { method_nth }
  collect_results result1, ..., resultn
end

Simple, isn't it? But let's get to more details. This is a future object:

result1 = future { method_one }

It means, the result1 is getting evaluated in the background. You can pass it around to other methods. But result1 doesn't have any result yet, it is still processing in the background. Think of passing around a Thread. But the major difference is - the moment you try to read it, instead of passing it around, it blocks and waits for the result at that point. So in the above example, all the result1 .. resultn variables will keep getting evaluated in the background, but when the time comes to collect the results, and when you try to actually read these values, the reads will wait for the queries to finish at that point.

Install the promise gem and try the below in Ruby console:

require 'future'
x = future { sleep 20; puts 'x calculated'; 10 }; nil
# adding a nil to the end so that x is not immediately tried to print in the console
y = future { sleep 25; puts 'y calculated'; 20 }; nil

# At this point, you'll still be using the console!
# The sleeps are happening in the background

# Now do:
x + y
# At this point, the program actually waits for the x & y future blocks to complete

Edit: Typo in result, should have been result1, change echo to puts

like image 101
Subhas Avatar answered Nov 06 '22 06:11

Subhas


You can take a look at a new option in town: The futoroscope gem. As you can see by the announcing blog post it tries to solve the same problem you are facing, making simultaneous API query's. It seems to have pretty good support and good test coverage.

like image 34
Paulo Fidalgo Avatar answered Nov 06 '22 04:11

Paulo Fidalgo


Assuming that your problem is a slow external API, a solution could be the use of either threaded programming or asynchronous programming. By default when doing IO, your code will block. This basically means that if you have a method that does an HTTP request to retrieve some JSON your method will tell your operating system that you're going to sleep and you don't want to be woken up until the operating system has a response to that request. Since that can take several seconds, your application will just idly have to wait.

This behavior is not specific to just HTTP requests. Reading from a file or a device such as a webcam has the same implications. Software does this to prevent hogging up the CPU when it obviously has no use of it.

So the question in your case is: Do we really have to wait for one method to finish before we can call another? In the event that the behavior of method_two is dependent on the outcome of method_one, then yes. But in your case, it seems that they are individual units of work without co-dependence. So there is a potential for concurrency execution.

You can start new threads by initializing an instance of the Thread class with a block that contains the code you'd like to run. Think of a thread as a program inside your program. Your Ruby interpreter will automatically alternate between the thread and your main program. You can start as many threads as you'd like, but the more threads you create, the longer turns your main program will have to wait before returning to execution. However, we are probably talking microseconds or less. Let's look at an example of threaded execution.

def main_method
  Thread.new { method_one }
  Thread.new { method_two }
  Thread.new { method_three }
end

def method_one
  # something_slow_that_does_an_http_request
end

def method_two
  # something_slow_that_does_an_http_request
end

def method_three
  # something_slow_that_does_an_http_request
end

Calling main_method will cause all three methods to be executed in what appears to be parallel. In reality they are still being sequentually processed, but instead of going to sleep when method_one blocks, Ruby will just return to the main thread and switch back to method_one thread, when the OS has the input ready.

Assuming each method takes two 2 ms to execute minus the wait for the response, that means all three methods are running after just 6 ms - practically instantly.

If we assume that a response takes 500 ms to complete, that means you can cut down your total execution time from 2 + 500 + 2 + 500 + 2 + 500 to just 2 + 2 + 2 + 500 - in other words from 1506 ms to just 506 ms.

It will feel like the methods are running simultanously, but in fact they are just sleeping simultanously.

In your case however you have a challenge because you have an operation that is dependent on the completion of a set of previous operations. In other words, if you have task A, B, C, D, E and F, then A, B, C, D and E can be performed simultanously, but F cannot be performed until A, B, C, D and E are all complete.

There are different ways to solve this. Let's look at a simple solution which is creating a sleepy loop in the main thread that periodically examines a list of return values to make sure some condition is fullfilled.

def task_1
# Something slow
return results
end

def task_2
# Something slow
return results
end

def task_3
# Something slow
return results
end

my_responses = {}
Thread.new { my_responses[:result_1] = task_1 }
Thread.new { my_responses[:result_2] = task_2 }
Thread.new { my_responses[:result_3] = task_3 }

while (my_responses.count < 3) # Prevents the main thread from continuing until the three spawned threads are done and have dumped their results in the hash.
  sleep(0.1) # This will cause the main thread to sleep for 100 ms between each check. Without it, you will end up checking the response count thousands of times pr. second which is most likely unnecessary.
end

# Any code at this line will not execute until all three results are collected.

Keep in mind that multithreaded programming is a tricky subject with numerous pitfalls. With MRI it's not so bad, because while MRI will happily switch between blocked threads, MRI doesn't support executing two threads simultanously and that solves quite a few concurrency concerns.

If you want to get into multithreaded programming, I recommend this book: http://www.amazon.com/Java-Concurrency-Practice-Brian-Goetz/dp/0321349601

It's centered around Java, but the pitfalls and concepts explained are universal.

like image 3
Niels B. Avatar answered Nov 06 '22 05:11

Niels B.