Efficient way to walk collections in parallel with doseq (or for)?

Question

(doseq [e coll1]
  (myfunc e))

is very fast, if all you care about are side effects. What if I want myfunc to take elements from multiple collections "in parallel", i.e. apply myfunc to the first elements of each collection, then to all of the second elements, then to all of the third elements, etc.? Note that this is as much a question about the functionality of for as doseq, but if one wants a sequence as output, map will do what's needed, so for isn't necessary.

(doseq [e1 coll1
        e2 coll2]
   (myfunc e1 e2))

will instead apply myfunc to all possible combinations of elements from the two collections. If I know in advance what the elements of the collection will be, I could use a :when test to combine only certain elements, but suppose that I don't know that?

One solution is to create ntuples to avoid the Cartesian product, but that is time consuming, removing the speed advantage of using doseq in the first place:

(let [argvecs (map vector coll1 coll2)] ; seq of ntuples of interleaved vals
  (doseq [args argvecs]
     (apply myfunc args))))

(This can be about 8X slower than a single-collection doseq. See times for domap1 and domap17 at the end of this question.)

amalloy · Accepted Answer

If you want to avoid the overhead of creating tuples with map, all you can do is write it yourself, as a loop/recur that walks each collection manually. But really, you'll still end up needing to create a tuple so that you can (apply f args), where args is the nth item of each collection. You'll save a few cons cells by not making a list of such tuples, but that's all. A lot of the expense of variadic functions like this is calling apply, and building the lists to do that with. You can avoid that by writing a 2-arity version of your doseq-sibling, and a 3-arity, and... But the n-arity version will always be slower.

claj · Answer

If it's speed you're after you should turn on reflection-warnings and maybe check out the loop-primitive (recuring with (rest coll1) (rest coll2))...

also checkout Clojure is still fast and the performance testing framework Criterium to make sure you are measuring the right thing.

A. Webb · Answer

Use (dorun (map f coll1 coll2 ..)) or (dorun (map apply f colls)).

The more you ask of f, the longer it is going to take.

(def a (atom 0)
(defn f [& args] (swap! a #(apply + % args)))
(def N 10000)

On a single collection use doseq. There is avoidable overhead from the lazy-seq structure.

(bench (doseq [e (range N)] (f e)))
Execution time mean : 4.959713 ms

(bench (dorun (map f (range N))))
Execution time mean : 5.669721 ms

On two collections, note f has to add twice instead of once, so I would expect this to take twice as long. Note now both versions have some structural overhead.

(bench (let [argvecs (map vector (range N) (range N))] 
  (doseq [e argvecs] (apply f e))))
Execution time mean : 11.876843 ms

(bench (dorun (map f (range N) (range N))))
Execution time mean : 11.145435 ms

Efficient way to walk collections in parallel with doseq (or for)?

Tags:

clojure

Mars

3 Answers

amalloy

claj

A. Webb

Recent Activity

Donate For Us

Efficient way to walk collections in parallel with doseq (or for)?

Tags:

clojure

Mars

3 Answers

amalloy

claj

A. Webb

Related questions

Recent Activity

Donate For Us