I have a set of a small number of functions. Two functions perform a mathematical overlay operation (defined on http://docs.gimp.org/en/gimp-concepts-layer-modes.html, but a little down -- just search for "overlay" to find the math) in different ways. Now, this operation is something that Gimp does very quickly, in under a second, but I can't seem to optimize my code to get anything like remotely similar time.
(My application is a GUI application to help me see and compare various overlay combinations of a large number of files. The Gimp layer interface actually makes it rather difficult to just pick two images to overlay, then pick a different two, etc.)
Here is the code:
(set! *warn-on-reflection* true )
(defn to-8-bit [v]
(short (* (/ v 65536) 256)))
(defn overlay-sample [base-p over-p]
(to-8-bit
(* (/ base-p 65536)
(+ base-p
(* (/ (* 2 over-p) 65536)
(- 65536 base-p))))))
(defn overlay-map [^shorts base ^shorts over]
(let [ovl (time (doall (map overlay-sample ^shorts base ^shorts over)))]
(time (into-array Short/TYPE ovl))))
(defn overlay-array [base over]
(let [ovl (time (amap base
i
r
(int (overlay-sample (aget r i)
(aget over i)))))]
ovl))
overlay-map and overlay-array do the same operation in different ways. I've written other versions of this operation, too. However, overlay-map is, by far, the fastest I have.
base and over, in both functions, are 16-bit integer arrays. The actual size of each is 1,276,800 samples (an 800 x 532 image with 3 samples per pixel). The end result should be a single array of the same, but scaled down to 8-bits.
My results from the (time) operation are pretty consistent. overlay-map runs the actual mathematical operation in about 16 or 17 seconds, then spends another 5 seconds copying the resulting sequence back into an integer array.
overlay-array takes about 111 seconds.
I've done a lot of reading about using arrays, type hints, etc, but my Java-Array-Only operation is amazingly slow! amap, aget, etc was all supposed to be fast, but I have read the code and there is nothing that looks like a speed optimization there, and my results are consistent. I've even tried other computers and seen roughly the same difference.
Now, 16-17 seconds is, actually rather painful at this data set, but I've been caching the results so that I can easily switch back and forth. The same operation would take an atrociously long time if I increased the size of the dataset to anything like a full-size image (4770x3177). And, there's other operations I want to be doing, too.
So, any suggestions on how to speed this up? What am I missing here?
UPDATE: I just made the entire project pertaining to this code public, so you can see the current version entire script I am using for speed tests at https://bitbucket.org/savannidgerinel/hdr-darkroom/src/62a42fcf6a4b/scripts/speed_test.clj . Feel free to download it and try it on your own gear, but obviously change the image file paths before running it.
Since your functions are purely mathematical, you might want to check out memoize
(def fast-overlay (memoize overlay-sample))
(time (fast-overlay 1000 2000))
"Elapsed time: 1.279 msecs"
(time (fast-overlay 1000 2000))
"Elapsed time: 0.056 msecs"
What's happening here is the arguments are being cached as the key and the return is the value. Where the value has already been computed, the value is returned rather than the function executed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With