More generic parfoldr than this

Question

My aim is to have a parallel foldr function. At first, it seemed rather straight forward to achieve and this is what I had in mind:

First break up the input list into partitions based on the number of cores (numCapabilities). Then apply foldr to each partition, which will result in a list of folded values for each partition. Then do a foldr again on that list to obtain the final value.

    listChunkSize = numCapabilities

    chunk n [] = []
    chunk n xs = ys : chunk n zs
      where (ys,zs) = splitAt n xs

    parfoldr f z [] = z
    parfoldr f z xs = res
      where
            parts = chunk listChunkSize xs
            partsRs = map (foldr f z) parts `using` parList rdeepseq
            res = foldr f z partsRs

The above code does not work because obviously the definition of foldr, (a -> b -> b) -> b -> [a] -> b, implies that the input list type is (well, can be) different from the accumulator and result type.

For example,

1) foldr (+) 0 [1..10] => list type = accumulator type (Integer)

2) foldr (\i acc -> (i>5) && acc) True [1..10] => list type (Integer) ! = accumulator type (Bool)

So, looking at my code above, the map will generate a list of type b which is then passed as argument to the second foldr. But the second foldr accepts list of type a. So, that won't work.

An ugly solution would be to provide a different type signature for the parfoldr, e.g. parfoldr :: (NFData a) => (a -> a -> a) -> a -> [a] -> a

This will work but then it is not exactly equivalent to foldr. Example 1 above will do just fine, but not example 2. So, question 1 is: how to define parfoldr to have same type signature as foldr?

Comparing the 2 folds:

    input = [1..1000000]
    seqfold = foldr (+) 0
    parfold = parfoldr (+) 0

I get the foll. times on a dual core machine: (no -threaded flag)

    $ ./test
    seqfold: 4.99s
    parfold: 25.16s

(-threaded flag on)

    $ ./test
    seqfold: 5.32s
    parfold: 25.55s
    $ ./test +RTS -N1
    seqfold: 5.32s
    parfold: 25.53s
    $ ./test +RTS -N2
    seqfold: 3.48s
    parfold: 3.68s
    $ ./test +RTS -N3
    seqfold: 3.57s
    parfold: 2.36s
    $ ./test +RTS -N4
    seqfold: 3.03s
    parfold: 1.70s

Observations from these measurements:

foldr seems to give lower runtime when num of cores is increased. why is that?
parfold gives better runtimes for N => 3.

Any suggestions and ideas for improvement is appreciated :)

hammar · Accepted Answer

foldr is not in general parallelizable, as its interface allows sequential dependencies. In order to be able to rearrange the computations in the way you described you'll need to limit yourself to associative operators with an identity element. This is known as a monoid, and what you've implemented is essentially a parallel mconcat.

alternative · Answer

You can't, not exactly, because you have to depend on the property that you can split chunks. This means that, of course, you have to add the extra type restriction... The special case is if you have f :: a -> a -> a as your accumulating function, and f is associative.

Hence you would have to provide two functions, the one used in the chunks, and the one used to fold the chunk results. Your original version would just be a join on this function.

parfoldr :: NFData a => (a -> a -> a) -> a -> [a] -> a
parfoldr f = join $ parfoldr' f f

parfoldr' :: NFData b => (a -> b -> b) -> (b -> c -> c) -> b -> c -> [a] -> c
parfoldr' f g y z [] = z
parfoldr' f g y z xs = foldr g z partsRs
  where parts = chunk listChunkSize xs
        partsRs = map (foldr f y) parts `using` parList rdeepseq

Example 2 would then be

parfoldr' (\i acc -> (i>5) && acc) (&&) True True [1..10]

All in all, this isn't that much uglier.

More generic parfoldr than this

Tags:

haskell

parallel-processing

vis

2 Answers

hammar

alternative

Recent Activity

Donate For Us

More generic parfoldr than this

Tags:

haskell

parallel-processing

vis

2 Answers

hammar

alternative

Related questions

Recent Activity

Donate For Us