Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sequencing IO actions in parallel

I have a function that returns an IO action,

f :: Int -> IO Int

I would like to compute this function in parallel for multiple values of the argument. My naive implementation was as follows:

import Control.Parallel.Strategies

vals = [1..10]
main = do
      results <- mapM f vals
      let results' = results `using` parList rseq
      mapM_ print results'

My reasoning for this was that the first mapM binds something of type IO [Int] to results, results' applies a parallel strategy to the contained list, and the mapM_ finally requests the actual values by printing them - but what is to be printed is already sparked in parallel, so the program should parallelize.

After being happy that it does indeed use all my CPUs, I noticed that the program is less effective (as in wall clock time) when being run with +RTS -N8 than without any RTS flags. The only explanation I can think of is that the first mapM has to sequence - i.e. perform - all the IO actions already, but that would not lead to ineffectivity, but make the N8 execution as effective as the unparallelized one, because all the work is done by the master thread. Running the program with +RTS -N8 -s yields SPARKS: 36 (11 converted, 0 overflowed, 0 dud, 21 GC'd, 4 fizzled), which surely isn't optimal, but unfortunately I can't make any sense of it.

I suppose I've found one of the beginner's stepping stones in Haskell parallelization or the internals of the IO monad. What am I doing wrong?

Background info: f n is a function that returns the solution for Project Euler problem n. Since many of them have data to read, I put the result into the IO monad. An example of how it may look like is

-- Problem 13: Work out the first ten digits of the sum of one-hundred 50-digit numbers.

euler 13 = fmap (first10 . sum) numbers
      where
            numbers = fmap (map read . explode '\n') $ readFile "problem_13"
            first10 n
                  | n < 10^10 = n -- 10^10 is the first number with 11 digits
                  | otherwise  = first10 $ n `div` 10

The full file can be found here (It's a bit long, but the first few "euler X" functions should be representative enough), the main file where I do the parallelism is this one.

like image 435
David Avatar asked Oct 28 '12 15:10

David


1 Answers

Strategies are for parallel execution of pure computations. If it really is mandatory that your f returns an IO value, then consider using the async package instead. It provides useful combinators for running IO actions concurrently.

For your use case, mapConcurrently looks useful:

import Control.Concurrent.Async

vals = [1..10]
main = do
  results <- mapConcurrently f vals
  mapM_ print results

(I haven't tested though, because I don't know what your f is exactly.)

like image 149
kosmikus Avatar answered Sep 28 '22 10:09

kosmikus