Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Slowdown when using parallel strategies in Haskell

I was working through the exercises of Andre Loh's deterministic parallel programming in haskell exercises. I was trying to convert the N-Queens sequential code into parallel by using strategies, but I noticed that the parallel code runs much slower than the sequential code and also errors out with insufficient stack space.

This is the code for the parallel N-Queens,

import Control.Monad
import System.Environment
import GHC.Conc
import Control.Parallel.Strategies
import Data.List
import Data.Function

type PartialSolution = [Int] -- per column, list the row the queen is in
type Solution = PartialSolution

type BoardSize = Int

chunk :: Int -> [a] -> [[a]]
chunk n [] = []
chunk n xs = case splitAt n xs of
         (ys, zs) -> ys : chunk n zs

-- Generate all solutions for a given board size.
queens :: BoardSize -> [Solution]
--queens n = iterate (concatMap (addQueen n)) [[]] !! n
queens n = iterate (\l -> concat (map (addQueen n) l `using` parListChunk (n `div`            numCapabilities) rdeepseq)) [[]] !! n


-- Given the size of the problem and a partial solution for the
-- first few columns, find all possible assignments for the next
-- column and extend the partial solution.
addQueen :: BoardSize -> PartialSolution -> [PartialSolution]
addQueen n s = [ x : s | x <- [1..n], safe x s 1 ]

-- Given a row number, a partial solution and an offset, check
-- that a queen placed at that row threatens no queen in the
-- partial solution.
safe :: Int -> PartialSolution -> Int -> Bool
safe x []    n = True
safe x (c:y) n = x /= c && x /= c + n && x /= c - n && safe x y (n + 1)

main = do
        [n] <- getArgs
        print $ length $ queens (read n)

The line (\l -> concat (map (addQueen n) l using parListChunk (n div numCapabilities) rdeepseq)) is what I changed from the original code. I have seen Simon Marlow's solution but I wanted to know the reason for the slowdown and error in my code.

Thanks in advance.

like image 950
prasannak Avatar asked Apr 04 '12 10:04

prasannak


1 Answers

You are sparking way too much work. The parListChunk parameter of div n numCapabilities is probably, what, 7 on your system (2 cores and you're running with n ~ 14). The list is going to grow large very quickly so there is no point in sparking such small units of work (and I don't see why it makes sense tying it to the value of n).

If I add a factor of ten (making the sparking unit 70 in this case) then I get a clear performance win over single threading. Also, I don't have the stack issue you refer to - if it goes away with a change to your parListChunk value then I'd report that as a bug.

If I make the chunking every 800 then the times top off at 5.375s vs 7.9s. Over 800 and the performance starts to get worse again, ymmv.

EDIT:

[tommd@mavlo Test]$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 7.0.4
[tommd@mavlo Test]$ ghc -O2 so.hs -rtsopts -threaded -fforce-recomp ; time ./so 13 +RTS -N2
[1 of 1] Compiling Main             ( so.hs, so.o )
Linking so ...
73712
real    0m5.404s

[tommd@mavlo Test]$ ghc -O2 so.hs -rtsopts -fforce-recomp ; time ./so 13
[1 of 1] Compiling Main             ( so.hs, so.o )
Linking so ...
73712
real    0m8.134s
like image 173
Thomas M. DuBuisson Avatar answered Sep 29 '22 12:09

Thomas M. DuBuisson