Parallel evaluation of list

Question

I was trying to experiment with parallel evaluation in Haskell, but seem to have hit a wall.

Just as an experiment, I wanted to evaluate a list of tasks that take a long time to complete. So I came up with this contrived example.

import Control.Parallel.Strategies

startNum = 800000

bigList :: [Integer]
bigList = [2042^x | x <- [startNum..startNum+10]]

main = print $ sum $ parMap rdeepseq (length . show) bigList

I compiled this with ghc -O2 -eventlog -rtsopts -threaded test.hs --make and then ran it twice.

$ time ./test +RTS -N1 -lf -sstderr
29128678
   2,702,130,280 bytes allocated in the heap
      59,409,320 bytes copied during GC
       3,114,392 bytes maximum residency (68 sample(s))
       1,093,600 bytes maximum slop
              28 MB total memory in use (6 MB lost due to fragmentation)
                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      3101 colls,     0 par    0.09s    0.08s     0.0000s    0.0005s
  Gen  1        68 colls,     0 par    0.03s    0.03s     0.0004s    0.0009s
  TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)
  SPARKS: 11 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 11 fizzled)
  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time   10.13s  ( 10.13s elapsed)
  GC      time    0.11s  (  0.11s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time   10.25s  ( 10.25s elapsed)
  Alloc rate    266,683,731 bytes per MUT second
  Productivity  98.9% of total user, 98.9% of total elapsed
gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 0
real    0m10.250s
user    0m10.144s
sys     0m0.106s

$ time ./test +RTS -N4 -lf -sstderr
29128678
   2,702,811,640 bytes allocated in the heap
     712,017,768 bytes copied during GC
      22,024,144 bytes maximum residency (67 sample(s))
       6,134,968 bytes maximum slop
              68 MB total memory in use (3 MB lost due to fragmentation)
                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      1329 colls,  1329 par    2.77s    0.70s     0.0005s    0.0075s
  Gen  1        67 colls,    66 par    0.11s    0.03s     0.0004s    0.0019s
  Parallel GC work balance: 40.17% (serial 0%, perfect 100%)
  TASKS: 10 (1 bound, 9 peak workers (9 total), using -N4)
  SPARKS: 11 (11 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time   51.56s  ( 13.04s elapsed)
  GC      time    2.89s  (  0.73s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time   54.45s  ( 13.77s elapsed)
  Alloc rate    52,423,243 bytes per MUT second
  Productivity  94.7% of total user, 374.4% of total elapsed
gc_alloc_block_sync: 39520
whitehole_spin: 0
gen[0].sync: 3046
gen[1].sync: 4970
real    0m13.777s
user    0m44.362s
sys     0m10.093s

I notice a slight increase in GC time, but nothing that I would have thought the extra cores wouldn't be able to over come.

So I got out threadscope to have a look.

This is the result for -N1 Run with -N1

And this is the result for -N4 Run with -N4

It seems like the sparks are able to be executed much more rapidly in the -N1 case.

My Question. Why is this not seeing the speed up I would expect from a bunch of independent tasks being executed in parallel?

Twan van Laarhoven · Accepted Answer

This seems to have to do with the Integer operations. If you replace those with something else then you will see a speedup from parallel processing.

This code has no speed up with -N2:

main = 
  let x = length . show $ 10^10000000
      y = length . show $ 10^10000001 in
  x `par` y `pseq` print (x + y)

Nor does this

main = 
  let x = (10^10000000 :: Integer) `quotRem` 123
      y = (10^10000001 :: Integer) `quotRem` 123 in
  x `par` y `pseq` print "ok"

But this code does have a parallel speedup:

main = 
  let x = length $ replicate 1000000000 'x'
      y = length $ replicate 1000000001 'y' in
  x `par` y `pseq` print (x + y)

I couldn't find any locking in integer-gmp, though.

Parallel evaluation of list

Tags:

haskell

parallel-processing

Alpaca

1 Answers

Twan van Laarhoven

Recent Activity

Donate For Us

Parallel evaluation of list

Tags:

haskell

parallel-processing

Alpaca

1 Answers

Twan van Laarhoven

Related questions

Recent Activity

Donate For Us