Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a parallel find in Haskell?

I have some kind of brute force problem I like to solve in Haskell. My machine has 16 cores so I want to speed up my current algorithm a bit.

I have a method "tryCombination" which returns either a Just (String) or a Nothing. My loop looks like this:

findSolution = find (isJust)  [tryCombination a1 a2 a3 n z p |
                               a1 <- [600..700],
                               a2 <- [600..700],
                               a3 <- [600..700],
                               n  <- [1..100],
                               ....

I know there is a special parMap to parallelize a map function. A mapFind could be tricky as it is not predictable, if a thread really finds the first occurence. But is there something like a mapAny to speed up the search?

EDIT:

I rewrote the code using the "withStrategy (parList rseq)" snippet. The status report looks like this:

38,929,334,968 bytes allocated in the heap
 2,215,280,048 bytes copied during GC
     3,505,624 bytes maximum residency (795 sample(s))
       202,696 bytes maximum slop
            15 MB total memory in use (0 MB lost due to fragmentation)

                                  Tot time (elapsed)  Avg pause  Max pause
Gen  0     44922 colls, 44922 par   37.33s    8.34s     0.0002s    0.0470s
Gen  1       795 colls,   794 par    7.58s    1.43s     0.0018s    0.0466s

Parallel GC work balance: 4.36% (serial 0%, perfect 100%)

TASKS: 10 (1 bound, 9 peak workers (9 total), using -N8)

SPARKS: 17576 (8198 converted, 9378 overflowed, 0 dud, 0 GC'd, 0 fizzled)

INIT    time    0.00s  (  0.00s elapsed)
MUT     time   81.79s  ( 36.37s elapsed)
GC      time   44.91s  (  9.77s elapsed)
EXIT    time    0.00s  (  0.00s elapsed)
Total   time  126.72s  ( 46.14s elapsed)

Alloc rate    475,959,220 bytes per MUT second

Productivity  64.6% of total user, 177.3% of total elapsed

gc_alloc_block_sync: 834851
whitehole_spin: 0
gen[0].sync: 10
gen[1].sync: 3724

As I already mentioned (see my comments), all the cores are working for only three seconds (wenn all the sparks are processed). The following 30s all the work is done by a single core. How can I optimize still more?

Some more EDIT:

I now gave "withStrategy (parBuffer 10 rdeepseq)" a try and fiddled around with different buffer sizes:

Buffersize    GC work Balance     MUT     GC
       10              50%     11,69s  0,94s
      100              47%     12,31s  1,67s
      500              40%     11,5 s  1,35s
     5000              21%     11,47s  2,25s

First of all I can say, that this is a big improvement against the 59s it took without any multithreading. The second conclusion is, that the buffer size should be as small as possible but bigger than the number of cores. But the best is, that I have neither overflowed nor fizzled sparks any more. All were converted successfully.

like image 939
Hennes Avatar asked Apr 23 '15 05:04

Hennes


1 Answers

Depending on the lazyness of tryCombination and the desired parallelization, one of these might do what you want:

import Control.Parallel.Strategies
findSolution =
    find (isJust) $
    withStrategy (parList rseq) $
    [ tryCombination a1 a2 a3 n z p
    | a1 <- [600..700]
    , a2 <- [600..700]
    , a3 <- [600..700]
    , n  <- [1..100]]

This paralleizes the work performed by tryCombination to figure out whether it is a Just or a Nothing, but not the actual result in the Just.

If there is no such lazyness to be exploited and the result type is simple, it might work better to write

findSolution =
    find (isJust) $
    withStrategy (parList rdeepseq) $
    [ tryCombination a1 a2 a3 n z p
    | a1 <- [600..700]
    , a2 <- [600..700]
    , a3 <- [600..700]
    , n  <- [1..100]]
like image 98
Joachim Breitner Avatar answered Oct 20 '22 10:10

Joachim Breitner