I have read about Vector
library that uses modern optimization techniques and try to compare its perfomance with lists. The code below generates some sound-like data (which is important to my subject area) and sums the result:
import System.Environment (getArgs)
import System.TimeIt
import Data.List
import Data.Vector.Unboxed as V
x1 :: Int -> [Double]
x1 n = [1..(fromIntegral n)]
x2 :: Int -> V.Vector Double
x2 n = V.enumFromN 1 n
osc1 f = Prelude.map (\x -> sin(2*pi*x*f/44100.0))
osc2 f = V.map (\x -> sin(2*pi*x*f/44100.0))
sum1 = Data.List.foldl1' (+)
sum2 = V.foldl1' (+)
zip1 = Prelude.zipWith (+)
zip2 = V.zipWith (+)
main = do s <- getArgs
let n = read (s !! 0) :: Int
print "Prelude version"
timeIt $ print $ sum1 $ zip1 (osc1 55.5 (x1 n)) (osc1 110.0 $ x1 n)
print "Vector version"
timeIt $ print $ sum2 $ zip2 (osc2 55.5 (x2 n)) (osc2 110.0 $ x2 n)
GHC 7.6.3 running on win7 with vector0.10.0.1 and timeit1.0.0.0 gave me these results:
c:\coding>test 10000000
"Prelude version"
90.98579564908658
CPU time: 9.92s
"Vector version"
90.98579564908658
CPU time: 11.03s
Vector version is a little bit slower even being Unboxed
and it takes 22.67 seconds for the boxed Vector version. Why does this happen? How should I write this code to get maximum perfomace?
UPD. After adding -O2
(**) I got more clear to me results. It looks like boxed vectors are harder to fuse.
List Vector.Unboxed Vector
ghc test.hs 9.78 10.94 21.95
ghc test.hs -O2 3.39 1.25 7.57
(**) I didn't note because ghc doesn't recompile unchanged files even if command line flags are different and I hadn't actually run -O2
version before noticed that. Sorry
It's a matter of optimization flags:
-o0
>test 10000000
"Prelude version"
90.98579564908658
CPU time: 6.66s
"Vector version"
90.98579564908658
CPU time: 8.27s
-o1
>test 10000000
"Prelude version"
90.98579565011536
CPU time: 2.70s
"Vector version"
90.98579565011924
CPU time: 1.62s
-o2
>test 10000000
"Prelude version"
90.98579565011536
CPU time: 2.72s
"Vector version"
90.98579565011924
CPU time: 1.34s
From Haskell tag info:
Performance issues
In case of performance issues, please make sure that you compile your code with optimizations enabled. Passing -O2 makes many performance issues go away.
update
For a quick explanation on why Unboxed is faster here's one:
The most flexible type is Data.Vector.Vector, which provides boxed arrays: arrays of pointers to Haskell values.
These arrays are suitable for storing complex Haskell types (sum types, or algebraic data types), but a better choice for simple data types is Data.Vector.Unboxed.
For Unboxed:
Simple, atomic types, and pair types can be stored in a more efficient manner: consecutive memory slots without pointers.
[off] Optimization's slightly changing the result, that's interesting. [/off]
I'd say that the vector
version is forced to actually materialize the vector (allocating memory for it) and working with that much like an implementation using for
loops and arrays in an imperative setting. In a sense it "does what one would expect" (when having an imperative background at least).
But there's something interesting happening with the version using lists, and this magic is called "stream fusion": The compiler is smart enough to figure out that it is sufficient to keep track of the sum in order to compute the final result. This is done by computing values and adding them up, finally printing out the sum. The actual list is never needed at all, so it's never allocated or traversed.
I did not verify this by looking at the generated Core, so...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With