Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parallel HTTP requests in ruby

I have an array of URLs and I wan't to open each one and fetch a specific tag.
But I want to do this in parallel.

Here is the pseudocode for what I want to do:

urls = [...]
tags = []
urls.each do |url|
  fetch_tag_asynchronously(url) do |tag|
    tags << tag
  end
end
wait_for_all_requests_to_finish()

If this could be done in a nice and safe way that would be awesome.
I could use thread but it doesn't look like arrays are thread safe in ruby.

like image 649
Nicklas A. Avatar asked Jan 08 '12 15:01

Nicklas A.


2 Answers

You can achieve thread-safety by using a Mutex:

require 'thread'  # for Mutex

urls = %w(
  http://test1.example.org/
  http://test2.example.org/
  ...
)

threads = []
tags = []
tags_mutex = Mutex.new

urls.each do |url|
  threads << Thread.new(url, tags) do |url, tags|
    tag = fetch_tag(url)
    tags_mutex.synchronize { tags << tag }
  end
end

threads.each(&:join)

It could however be counter-productive to use a new thread for every URL, so limiting the number of threads like this might be more performant:

THREAD_COUNT = 8  # tweak this number for maximum performance.

tags = []
mutex = Mutex.new

THREAD_COUNT.times.map {
  Thread.new(urls, tags) do |urls, tags|
    while url = mutex.synchronize { urls.pop }
      tag = fetch_tag(url)
      mutex.synchronize { tags << tag }
    end
  end
}.each(&:join)
like image 55
Niklas B. Avatar answered Sep 19 '22 18:09

Niklas B.


The Typhoeus/Hydra gem combination is designed to do this very easily. It's very convenient and powerful.

like image 38
the Tin Man Avatar answered Sep 20 '22 18:09

the Tin Man