Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Clojure - executing a bunch of HTTP requests in parallel - pmap?

I need to make 200 or so HTTP requests. I want them to run in parallel, or batches, and I'm not sure where to start for doing this in Clojure. pmap appears to have the effect I want, for example, using http.async.client:

(defn get-json [url]
    (with-open [client (http/create-client)]
        (let [resp (http/GET client url)]
            (try
                (println 1)
                (http/string (http/await resp))
                (println "********DONE*********")
                nil

            (catch Exception e (println e) {})))))


music.core=> (pmap get-json [url url2])
1
1
********DONE*********
********DONE*********
(nil nil)

But I can't prove that the requests are actually executing in parallel. Do I need to call into the JVM's Thread APIs? I'm searching around and coming up with other libraries like Netty, Lamina, Aleph - should I be using one of these? Please just point me in the right direction for learning about the best practice/simplest solution.

like image 952
Rob Lourens Avatar asked Jan 30 '14 05:01

Rob Lourens


2 Answers

Ideally you don't want to tie up a thread waiting for the result of each http request, so pmap or other thread-based approaches aren't really a good idea.

What you really want to do is:

  • Fire off all the requests asynchronously
  • Wait for the results with just one thread

My suggested approach is to use http-kit to fire off all the asynchronous requests at once, producing a sequence of promises. You then just need to dereference all these promises in a single thread, which will block the thread until all results are returned.

Something like:

(require '[org.httpkit.client :as http])

(let [urls (repeat 100 "http://google.com") ;; insert your URLs here
      promises (doall (map http/get urls))
      results (doall (map deref promises))]
  #_do_stuff_with_results 
  (first results))
like image 85
mikera Avatar answered Nov 09 '22 10:11

mikera


What you're describing is a perfectly good use of pmap and I'd approach it in similar fashion.

As far as 'proving' that it runs in parallel, you have to trust that each iteration of pmap runs the function in a new thread. However a simple way to be certain is simply print the thread id as a sanity check:

user=> (defn thread-id [_] (.getId (Thread/currentThread)))

user=> (pmap thread-id [1 2 3])

(53 11 56)

As the thread numbers are in fact different - meaning clojure is creating a new thread each time - you can safely trust the JVM will run your code in parallel.

Also have a look at other parallel functions such as pvalues and pcalls. They give you different semantics and might be the right answer depending on the problem at hand.

like image 44
leonardoborges Avatar answered Nov 09 '22 10:11

leonardoborges