Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why Vector code is slower than standard list one

Tags:

haskell

I have read about Vector library that uses modern optimization techniques and try to compare its perfomance with lists. The code below generates some sound-like data (which is important to my subject area) and sums the result:

import System.Environment (getArgs)
import System.TimeIt
import Data.List
import Data.Vector.Unboxed as V

x1 :: Int -> [Double]
x1 n = [1..(fromIntegral n)]

x2 :: Int -> V.Vector Double
x2 n = V.enumFromN 1 n

osc1 f = Prelude.map (\x -> sin(2*pi*x*f/44100.0))
osc2 f = V.map (\x -> sin(2*pi*x*f/44100.0))

sum1 = Data.List.foldl1' (+)
sum2 = V.foldl1' (+)

zip1 = Prelude.zipWith (+)
zip2 = V.zipWith (+)

main = do s <- getArgs
          let n = read (s !! 0) :: Int
          print "Prelude version"
          timeIt $ print $ sum1 $ zip1 (osc1 55.5 (x1 n)) (osc1 110.0 $ x1 n)
          print "Vector version"
          timeIt $ print $ sum2 $ zip2 (osc2 55.5 (x2 n)) (osc2 110.0 $ x2 n)

GHC 7.6.3 running on win7 with vector0.10.0.1 and timeit1.0.0.0 gave me these results:

c:\coding>test 10000000
"Prelude version"
90.98579564908658
CPU time:   9.92s
"Vector version"
90.98579564908658
CPU time:  11.03s

Vector version is a little bit slower even being Unboxed and it takes 22.67 seconds for the boxed Vector version. Why does this happen? How should I write this code to get maximum perfomace?

UPD. After adding -O2 (**) I got more clear to me results. It looks like boxed vectors are harder to fuse.

                  List    Vector.Unboxed    Vector
ghc test.hs       9.78    10.94             21.95
ghc test.hs -O2   3.39    1.25              7.57

(**) I didn't note because ghc doesn't recompile unchanged files even if command line flags are different and I hadn't actually run -O2 version before noticed that. Sorry

like image 556
Dmitry Galchinsky Avatar asked Dec 25 '22 19:12

Dmitry Galchinsky


2 Answers

It's a matter of optimization flags:

-o0

>test 10000000
"Prelude version"
90.98579564908658
CPU time:   6.66s
"Vector version"
90.98579564908658
CPU time:   8.27s

-o1

>test 10000000
"Prelude version"
90.98579565011536
CPU time:   2.70s
"Vector version"
90.98579565011924
CPU time:   1.62s

-o2

>test 10000000
"Prelude version"
90.98579565011536
CPU time:   2.72s
"Vector version"
90.98579565011924
CPU time:   1.34s

From Haskell tag info:

Performance issues

In case of performance issues, please make sure that you compile your code with optimizations enabled. Passing -O2 makes many performance issues go away.

update

For a quick explanation on why Unboxed is faster here's one:

The most flexible type is Data.Vector.Vector, which provides boxed arrays: arrays of pointers to Haskell values.

These arrays are suitable for storing complex Haskell types (sum types, or algebraic data types), but a better choice for simple data types is Data.Vector.Unboxed.

For Unboxed:

Simple, atomic types, and pair types can be stored in a more efficient manner: consecutive memory slots without pointers.

[off] Optimization's slightly changing the result, that's interesting. [/off]

like image 84
MdxBhmt Avatar answered Jan 12 '23 01:01

MdxBhmt


I'd say that the vector version is forced to actually materialize the vector (allocating memory for it) and working with that much like an implementation using for loops and arrays in an imperative setting. In a sense it "does what one would expect" (when having an imperative background at least).

But there's something interesting happening with the version using lists, and this magic is called "stream fusion": The compiler is smart enough to figure out that it is sufficient to keep track of the sum in order to compute the final result. This is done by computing values and adding them up, finally printing out the sum. The actual list is never needed at all, so it's never allocated or traversed.

I did not verify this by looking at the generated Core, so...

like image 45
Waldheinz Avatar answered Jan 12 '23 00:01

Waldheinz