The documentation on the pmap
function leaves me wondering how efficient it would be for something like fetching a collection of XML feeds over the web. I have no idea how many concurrent fetch operations pmap would spawn and what the maximum would be.
If you check the source you see:
> (use 'clojure.repl)
> (source pmap)
(defn pmap
"Like map, except f is applied in parallel. Semi-lazy in that the
parallel computation stays ahead of the consumption, but doesn't
realize the entire result unless required. Only useful for
computationally intensive functions where the time of f dominates
the coordination overhead."
{:added "1.0"}
([f coll]
(let [n (+ 2 (.. Runtime getRuntime availableProcessors))
rets (map #(future (f %)) coll)
step (fn step [[x & xs :as vs] fs]
(lazy-seq
(if-let [s (seq fs)]
(cons (deref x) (step xs (rest s)))
(map deref vs))))]
(step rets (drop n rets))))
([f coll & colls]
(let [step (fn step [cs]
(lazy-seq
(let [ss (map seq cs)]
(when (every? identity ss)
(cons (map first ss) (step (map rest ss)))))))]
(pmap #(apply f %) (step (cons coll colls))))))
The (+ 2 (.. Runtime getRuntime availableProcessors))
is a big clue there. pmap will grab the first (+ 2 processors)
pieces of work and run them asynchronously via future
. So if you have 2 cores, it's going to launch 4 pieces of work at a time, trying to keep a bit ahead of you but the max should be 2+n.
future
ultimately uses the agent I/O thread pool which supports an unbounded number of threads. It will grow as work is thrown at it and shrink if threads are unused.
Building on Alex's excellent answer that explains how pmap works, here's my suggestion for your situation:
(doall
(map
#(future (my-web-fetch-function %))
list-of-xml-feeds-to-fetch))
Rationale:
No time to write a long response, but there's a clojure.contrib http-agent which creates each get/post request as its own agent. So you can fire off a thousand requests and they'll all run in parallel and complete as the results come in.
Looking the operation of pmap, it seems to go 32 threads at a time no mater what number of processors you have, the issue is that map will go ahead of the computation by 32 and the futures are started in their own. (SAMPLE)
(defn samplef [n]
(println "starting " n)
(Thread/sleep 10000)
n)
(def result (pmap samplef (range 0 100)))
; you will wait for 10 seconds and see 32 prints then when you take the 33rd an other 32 ; prints this mins that you are doing 32 concurrent threads at a time ; to me this is not perfect ; SALUDOS Felipe
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With