For a specific task, I need a lot of fast, individual writes in a mutable array. In order to check the performance, I've used the following test:
size :: Int
size = 256*256*16
arr :: UArray Int Int
arr = runST $ do
arr <- newArray (0,size) 0 :: ST s (STUArray s Int Int)
forM_ [0..size] $ \i -> do
writeArray arr i i
unsafeFreeze arr
arr_sum = foldl' (\ sum i -> sum + (arr ! i)) 0 [0..size-1]
main = print arr_sum
Here is the result:
vh:haskell apple1$ ghc -O3 bench.hs -o bench; time ./bench
Linking bench ...
549755289600
real 0m0.748s
user 0m0.697s
sys 0m0.048s
I suspected it shouldn't take 0.7s to fill a 256*256*16 array on memory, so I tested an equivalent program in JavaScript:
size = 256*256*16;
x = new Array(size);
s = 0;
for (var i=0; i<size; ++i)
x[i] = i;
for (var i=0; i<size; ++i)
s += x[i];
console.log(s);
And the result is:
vh:haskell apple1$ time node bench.js
549755289600
real 0m0.175s
user 0m0.150s
sys 0m0.024s
On C, the time was 0.012s
, which is a good lower bound.
#include <stdio.h>
#define SIZE (256*256*16)
double x[SIZE];
int main(){
int i;
double s = 0;
for (i = 0; i<SIZE; ++i)
x[i] = i;
for (i = 0; i<SIZE; ++i)
s += x[i];
printf("%f",s);
};
So that pretty much confirms my hypothesis that my Haskell program is doing something else other than just filling the array and summing it afterwards. There is probably a hidden stack somewhere, but I can not identify it since I used foldl'
and forM_
, which I believed were compiled to stack-free code. So, what is the source of inefficiency here?
GHC does not produce nice tight loops like what you accomplish with C. A factor of 3 in run times is about par for the course based on my experience.
To get better performance use the Vector library:
import qualified Data.Vector.Unboxed as V
size = 256*256*16 :: Int
doit = V.foldl' (+) 0 vec
where vec = V.generate size id
main = print doit
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With