I have a function frequencyBy
which I would like to parallelize. Here follows a simple test case:
import Control.Parallel.Strategies
import Control.DeepSeq
import System.Environment
frequencyBy :: (a -> b -> Bool) -> [a] -> [b] -> [(a,Int)]
frequencyBy f as bs = map
(\a ->(a, foldr (\b -> if f a b then (+) 1 else id) 0 bs)) as
main :: IO ()
main = do
x:xs <- getArgs
let result = frequencyBy (==) [1::Int .. 10000] [1 .. (read x)] `using`
parList rdeepseq
print $ product $ map snd $ result
I would like to run the map
in frequencyBy
in parallel. I'm trying to achieve this using parList rdeepseq
(all the other stuff in main
is just to ensure not everything is optimized away). However, this doesn't work, two threads do twice as much work as one thread does in the same time. I don't understand what I'm doing wrong here.
It could be that the overhead is slowing things down, depending on how big x is; if the work you're doing in each spark is comparable to the time it takes to spawn each spark (and of course there's scheduling overhead, etc.), then you'll run into problems.
You could try parListChunk
, e.g. parListChunk 64 rdeepseq
; you'll have to experiment to figure out which chunk size to use. While your current strategy is creating a spark for every element of the list, parListChunk
creates a spark for each chunk of a certain size in the list, and uses the strategy you specify sequentially over each element of that chunk.
By the way, the foldr
in frequencyBy
is probably slowing things down due to excessive thunk creation; something like
frequencyBy :: (a -> b -> Bool) -> [a] -> [b] -> [(a,Int)]
frequencyBy f as bs = map (\a -> (a, sum . map (const 1) . filter (f a) $ bs)) as
should fix that.
Of course, as always, make sure you're compiling with -O2
and running with +RTS -N
.
I think your parallelism is too fine-grained. parList
attempts to evaluate every element in parallel, and there really isn't that much work for any one element.
When I change from parList
to parListChunk 500
the execution time increases by almost 50%; as I'm on a dual-core machine that's about as good as it gets.
For reference, I was testing with x=20000
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With