I was trying to bench parMap vs map with a very simple example:
import Control.Parallel.Strategies
import Criterion.Main
sq x = x^2
a = whnf sum $ map sq [1..1000000]
b = whnf sum $ parMap rseq sq [1..1000000]
main = defaultMain [
bench "1" a,
bench "2" b
]
My results seem to indicate zero speedup from parMap and I was wondering why this might be?
benchmarking 1
Warning: Couldn't open /dev/urandom
Warning: using system clock for seed instead (quality will be lower)
time 177.7 ms (165.5 ms .. 186.1 ms)
0.997 R² (0.992 R² .. 1.000 R²)
mean 185.1 ms (179.9 ms .. 194.1 ms)
std dev 8.265 ms (602.3 us .. 10.57 ms)
variance introduced by outliers: 14% (moderately inflated)
benchmarking 2
time 182.7 ms (165.4 ms .. 199.5 ms)
0.993 R² (0.976 R² .. 1.000 R²)
mean 189.4 ms (181.1 ms .. 195.3 ms)
std dev 8.242 ms (5.896 ms .. 10.16 ms)
variance introduced by outliers: 14% (moderately inflated)
The problem is that parMap
sparks a parallel computation for each individual list element. It doesn't chunk the list at all as you seem to think from your comments—that would require the use of the parListChunk
strategy.
So parMap
has high overheads, so the fact that each spark simply squares one number means that its cost is overwhelmed by that overhead.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With