I have a haskell function I want to evaluate with exact intermediate results:
f 0 x = 0
f n x = let tmp = f (n-1) x in
tmp + (x-tmp^2)/2
Because of the (^2) the complexity grows exponentially in n. Since I want to do a plot and the computations for two different x are completely independant, I would have expected nearly optimal speedup from parallel evaluation. My code for this:
import Data.Ratio
import Control.Parallel.Strategies
f 0 x = 0
f n x = let tmp = f (n-1) x in
tmp + (x-tmp^2)/2
main = do
it <- readLn
let fn = fromRational . f it
values = map fn [0,1%2..10] :: [Double]
computed = values `using` parBuffer 16 rseq
mapM_ (putStrLn . show) computed
But to my surprise this does not really scale (on my dual core i3 with HT):
$ ghc -threaded -O f.hs
[1 of 1] Compiling Main ( f.hs, f.o )
Linking f ...
$ time echo 20 | (./f +RTS -N1 > /dev/null)
real 0m4.760s
user 0m4.736s
sys 0m0.016s
$ time echo 20 | (./f +RTS -N2 > /dev/null)
real 0m4.041s
user 0m5.416s
sys 0m2.548s
$ time echo 20 | (./f +RTS -N3 > /dev/null)
real 0m4.884s
user 0m10.936s
sys 0m3.464s
$ time echo 20 | (./f +RTS -N4 > /dev/null)
real 0m5.536s
user 0m17.028s
sys 0m3.888s
What am I doing wrong here? It looks like it spends quite some time in locks (sys?) instead of doing useful work.
I think that as the overall runtime is relatively small, you're suffering a lot from initial resizing of the heap during garbage collections. You can try making the initial allocation area larger by passing +RTS -A100M
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With