Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

More generic parfoldr than this

My aim is to have a parallel foldr function. At first, it seemed rather straight forward to achieve and this is what I had in mind:

First break up the input list into partitions based on the number of cores (numCapabilities). Then apply foldr to each partition, which will result in a list of folded values for each partition. Then do a foldr again on that list to obtain the final value.

    listChunkSize = numCapabilities

    chunk n [] = []
    chunk n xs = ys : chunk n zs
      where (ys,zs) = splitAt n xs

    parfoldr f z [] = z
    parfoldr f z xs = res
      where
            parts = chunk listChunkSize xs
            partsRs = map (foldr f z) parts `using` parList rdeepseq
            res = foldr f z partsRs

The above code does not work because obviously the definition of foldr, (a -> b -> b) -> b -> [a] -> b, implies that the input list type is (well, can be) different from the accumulator and result type.

For example,

1) foldr (+) 0 [1..10] => list type = accumulator type (Integer)

2) foldr (\i acc -> (i>5) && acc) True [1..10] => list type (Integer) ! = accumulator type (Bool)

So, looking at my code above, the map will generate a list of type b which is then passed as argument to the second foldr. But the second foldr accepts list of type a. So, that won't work.

An ugly solution would be to provide a different type signature for the parfoldr, e.g. parfoldr :: (NFData a) => (a -> a -> a) -> a -> [a] -> a

This will work but then it is not exactly equivalent to foldr. Example 1 above will do just fine, but not example 2. So, question 1 is: how to define parfoldr to have same type signature as foldr?

Comparing the 2 folds:

    input = [1..1000000]
    seqfold = foldr (+) 0
    parfold = parfoldr (+) 0

I get the foll. times on a dual core machine: (no -threaded flag)

    $ ./test
    seqfold: 4.99s
    parfold: 25.16s

(-threaded flag on)

    $ ./test
    seqfold: 5.32s
    parfold: 25.55s
    $ ./test +RTS -N1
    seqfold: 5.32s
    parfold: 25.53s
    $ ./test +RTS -N2
    seqfold: 3.48s
    parfold: 3.68s
    $ ./test +RTS -N3
    seqfold: 3.57s
    parfold: 2.36s
    $ ./test +RTS -N4
    seqfold: 3.03s
    parfold: 1.70s

Observations from these measurements:

  • foldr seems to give lower runtime when num of cores is increased. why is that?

  • parfold gives better runtimes for N => 3.

Any suggestions and ideas for improvement is appreciated :)

like image 218
vis Avatar asked Jul 23 '11 15:07

vis


2 Answers

foldr is not in general parallelizable, as its interface allows sequential dependencies. In order to be able to rearrange the computations in the way you described you'll need to limit yourself to associative operators with an identity element. This is known as a monoid, and what you've implemented is essentially a parallel mconcat.

like image 186
hammar Avatar answered Oct 31 '22 07:10

hammar


You can't, not exactly, because you have to depend on the property that you can split chunks. This means that, of course, you have to add the extra type restriction... The special case is if you have f :: a -> a -> a as your accumulating function, and f is associative.

Hence you would have to provide two functions, the one used in the chunks, and the one used to fold the chunk results. Your original version would just be a join on this function.

parfoldr :: NFData a => (a -> a -> a) -> a -> [a] -> a
parfoldr f = join $ parfoldr' f f

parfoldr' :: NFData b => (a -> b -> b) -> (b -> c -> c) -> b -> c -> [a] -> c
parfoldr' f g y z [] = z
parfoldr' f g y z xs = foldr g z partsRs
  where parts = chunk listChunkSize xs
        partsRs = map (foldr f y) parts `using` parList rdeepseq

Example 2 would then be

parfoldr' (\i acc -> (i>5) && acc) (&&) True True [1..10]

All in all, this isn't that much uglier.

like image 41
alternative Avatar answered Oct 31 '22 06:10

alternative