I just started reading Parallel and Concurrent Programming in Haskell.
I wrote two programs that, I believe, sums up a list in 2 ways:
rpar (force (sum list))
Here's the code:
import Control.Parallel.Strategies
import Control.DeepSeq
import System.Environment
main :: IO ()
main = do
[n] <- getArgs
[single, faster] !! (read n - 1)
single :: IO ()
single = print . runEval $ rpar (sum list)
faster :: IO ()
faster = print . runEval $ do
let (as, bs) = splitAt ((length list) `div` 2) list
res1 <- rpar (sum as)
res2 <- rpar (sum bs)
return (res1 + res2)
list :: [Integer]
list = [1..10000000]
Compile with parallelization enabled (-threaded)
C:\Users\k\Workspace\parallel_concurrent_haskell>ghc Sum.hs -O2 -threaded -rtsopts
[1 of 1] Compiling Main ( Sum.hs, Sum.o )
Linking Sum.exe ...
Results of single
Program
C:\Users\k\Workspace\parallel_concurrent_haskell>Sum 1 +RTS -s -N2
50000005000000
960,065,896 bytes allocated in the heap
363,696 bytes copied during GC
43,832 bytes maximum residency (2 sample(s))
57,016 bytes maximum slop
2 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 1837 colls, 1837 par 0.00s 0.01s 0.0000s 0.0007s
Gen 1 2 colls, 1 par 0.00s 0.00s 0.0002s 0.0003s
Parallel GC work balance: 0.18% (serial 0%, perfect 100%)
TASKS: 4 (1 bound, 3 peak workers (3 total), using -N2)
SPARKS: 1 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 1 fizzled)
INIT time 0.00s ( 0.00s elapsed)
MUT time 0.27s ( 0.27s elapsed)
GC time 0.00s ( 0.01s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 0.27s ( 0.28s elapsed)
Alloc rate 3,614,365,726 bytes per MUT second
Productivity 100.0% of total user, 95.1% of total elapsed
gc_alloc_block_sync: 573
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 0
Run with faster
C:\Users\k\Workspace\parallel_concurrent_haskell>Sum 2 +RTS -s -N2
50000005000000
1,600,100,336 bytes allocated in the heap
1,477,564,464 bytes copied during GC
400,027,984 bytes maximum residency (14 sample(s))
70,377,336 bytes maximum slop
911 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 3067 colls, 3067 par 1.05s 0.68s 0.0002s 0.0021s
Gen 1 14 colls, 13 par 1.98s 1.53s 0.1093s 0.5271s
Parallel GC work balance: 0.00% (serial 0%, perfect 100%)
TASKS: 4 (1 bound, 3 peak workers (3 total), using -N2)
SPARKS: 2 (0 converted, 0 overflowed, 0 dud, 1 GC'd, 1 fizzled)
INIT time 0.00s ( 0.00s elapsed)
MUT time 0.38s ( 1.74s elapsed)
GC time 3.03s ( 2.21s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 3.42s ( 3.95s elapsed)
Alloc rate 4,266,934,229 bytes per MUT second
Productivity 11.4% of total user, 9.9% of total elapsed
gc_alloc_block_sync: 335
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 0
Why did single
complete in 0.28 seconds, but faster
(poorly named, evidently) took 3.95 seconds?
I am no expert in haskell-specific profiling, but I can see several possible problems in faster
. You are walking the input list at least three times: once to get its length, once for splitAt (maybe it is twice, I'm not totally sure how this is implemented), and then again to read and sum its elements. In single
, the list is walked only once.
You also hold the entire list in memory at once with faster
, but with single
haskell can process it lazily, and GC as you go. If you look at the profiling output, you can see that faster
is copying many more bytes during GC: over 3,000 times more! faster
also needed 400MB of memory all at once, where single
needed only 40KB at a time. So the garbage collector had a larger space to keep scanning over.
Another big issue: you allocate a ton of new cons cells in faster
, to hold the two intermediate sub-lists. Even if it could all be GCed right away, this is a lot of time spent allocating. It's more expensive than just doing the addition to begin with! So even before you start adding, you are already "over budget" compared to simple
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With