I wanted to do some real-time graphics rendering and was trying to do multiple calculations per pixel per frame. I then quickly noticed that this was very slow and started at the very base: how fast can I loop over all the pixels?
I found dotimes reasonably fast, but when I do this in a REPL it is awfully slow:
user=> (dotimes [_ 10] (time (dotimes [_ 1e7] (+ 1 1))))
"Elapsed time: 409.177477 msecs"
"Elapsed time: 417.755502 msecs"
"Elapsed time: 418.939182 msecs"
"Elapsed time: 420.131575 msecs"
"Elapsed time: 419.83529 msecs"
"Elapsed time: 417.612003 msecs"
"Elapsed time: 420.749229 msecs"
"Elapsed time: 418.918554 msecs"
"Elapsed time: 414.403957 msecs"
"Elapsed time: 417.729624 msecs"
nil
user=>
Then I put this into a Leiningen project. When I do a "lein run" it is just as slow. But when I create the uberjar and run it with the java command it's a lot faster:
% java -jar target/looping-0.1.0-SNAPSHOT-standalone.jar
"Elapsed time: 122.006758 msecs"
"Elapsed time: 3.667653 msecs"
"Elapsed time: 3.60515 msecs"
"Elapsed time: 4.008436 msecs"
"Elapsed time: 3.961558 msecs"
"Elapsed time: 3.60212 msecs"
"Elapsed time: 3.592532 msecs"
"Elapsed time: 4.573949 msecs"
"Elapsed time: 3.959568 msecs"
"Elapsed time: 3.607495 msecs"
Although the first run is still a lot slower. What is the difference? In both cases the code is compiled, there is no interpreted Clojure, right? Is it JIT, some optimizations or some special JVM options that are set for the REPL?
Thanks for any ideas.
Leiningen runs the JVM with certain default options that improve startup time, but impair runtime performance. So, you might want to check again with :jvm-opts ^:replace []
added to your project.clj
.
Apart from this, while the below doesn't add anything in the way of explaining the timing discrepancy between the REPL and the überjar, I thought I'd comment on benchmarking in case you care about accurate results:
time
is not a good tool for benchmarking, whether with dotimes
or not. (No dotimes
-- the JIT compiler will not kick in; with dotimes
-- it probably will, but may well decide that the body of the loop is a noop and optimize it away completely.)
Hugo Duncan's Criterium is the robust Clojure solution that takes care of JIT warm-up, looping in a way which will not be optimized away and statistical processing of the results. A simple Criterium benchmark might look like this:
(require '[criterium.core :as c])
(def v [0 1 2])
(c/bench (nth v 0))
(This measures the time to access the initial element of a short vector held in a Var. I'd expect (+ 1 1)
eventually to be compiled to a constant, so there may be nothing left to benchmark.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With