Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use Parallel Strategies in Haskell

I have a function frequencyBy which I would like to parallelize. Here follows a simple test case:

import Control.Parallel.Strategies
import Control.DeepSeq
import System.Environment

frequencyBy :: (a -> b -> Bool) -> [a] -> [b] -> [(a,Int)]
frequencyBy f as bs = map 
    (\a ->(a, foldr (\b -> if f a b then (+) 1 else id) 0 bs)) as

main :: IO ()
main = do
  x:xs <- getArgs
  let result = frequencyBy (==) [1::Int .. 10000] [1 .. (read x)] `using` 
                 parList rdeepseq
  print $ product $ map snd $ result

I would like to run the map in frequencyBy in parallel. I'm trying to achieve this using parList rdeepseq (all the other stuff in main is just to ensure not everything is optimized away). However, this doesn't work, two threads do twice as much work as one thread does in the same time. I don't understand what I'm doing wrong here.

like image 403
user362382 Avatar asked Jan 13 '12 14:01

user362382


2 Answers

It could be that the overhead is slowing things down, depending on how big x is; if the work you're doing in each spark is comparable to the time it takes to spawn each spark (and of course there's scheduling overhead, etc.), then you'll run into problems.

You could try parListChunk, e.g. parListChunk 64 rdeepseq; you'll have to experiment to figure out which chunk size to use. While your current strategy is creating a spark for every element of the list, parListChunk creates a spark for each chunk of a certain size in the list, and uses the strategy you specify sequentially over each element of that chunk.

By the way, the foldr in frequencyBy is probably slowing things down due to excessive thunk creation; something like

frequencyBy :: (a -> b -> Bool) -> [a] -> [b] -> [(a,Int)]
frequencyBy f as bs = map (\a -> (a, sum . map (const 1) . filter (f a) $ bs)) as

should fix that.

Of course, as always, make sure you're compiling with -O2 and running with +RTS -N.

like image 192
ehird Avatar answered Nov 16 '22 00:11

ehird


I think your parallelism is too fine-grained. parList attempts to evaluate every element in parallel, and there really isn't that much work for any one element.

When I change from parList to parListChunk 500 the execution time increases by almost 50%; as I'm on a dual-core machine that's about as good as it gets.

For reference, I was testing with x=20000.

like image 43
John L Avatar answered Nov 16 '22 01:11

John L