My aim is to have a parallel foldr function. At first, it seemed rather straight forward to achieve and this is what I had in mind:
First break up the input list into partitions based on the number of
cores (numCapabilities
). Then apply foldr to each partition, which
will result in a list of folded values for each partition. Then do a
foldr again on that list to obtain the final value.
listChunkSize = numCapabilities
chunk n [] = []
chunk n xs = ys : chunk n zs
where (ys,zs) = splitAt n xs
parfoldr f z [] = z
parfoldr f z xs = res
where
parts = chunk listChunkSize xs
partsRs = map (foldr f z) parts `using` parList rdeepseq
res = foldr f z partsRs
The above code does not work because obviously the definition of
foldr, (a -> b -> b) -> b -> [a] -> b
, implies that the input list
type is (well, can be) different from the accumulator and result type.
For example,
1) foldr (+) 0 [1..10]
=> list type = accumulator type (Integer)
2) foldr (\i acc -> (i>5) && acc) True [1..10]
=> list type (Integer) !
= accumulator type (Bool)
So, looking at my code above, the map will generate a list of type b
which is then passed as argument to the second foldr. But the second
foldr accepts list of type a
. So, that won't work.
An ugly solution would be to provide a different type signature for
the parfoldr, e.g.
parfoldr :: (NFData a) => (a -> a -> a) -> a -> [a] -> a
This will work but then it is not exactly equivalent to foldr. Example 1 above will do just fine, but not example 2. So, question 1 is: how to define parfoldr to have same type signature as foldr?
Comparing the 2 folds:
input = [1..1000000]
seqfold = foldr (+) 0
parfold = parfoldr (+) 0
I get the foll. times on a dual core machine: (no -threaded flag)
$ ./test
seqfold: 4.99s
parfold: 25.16s
(-threaded flag on)
$ ./test
seqfold: 5.32s
parfold: 25.55s
$ ./test +RTS -N1
seqfold: 5.32s
parfold: 25.53s
$ ./test +RTS -N2
seqfold: 3.48s
parfold: 3.68s
$ ./test +RTS -N3
seqfold: 3.57s
parfold: 2.36s
$ ./test +RTS -N4
seqfold: 3.03s
parfold: 1.70s
Observations from these measurements:
foldr seems to give lower runtime when num of cores is increased. why is that?
parfold gives better runtimes for N => 3.
Any suggestions and ideas for improvement is appreciated :)
foldr
is not in general parallelizable, as its interface allows sequential dependencies. In order to be able to rearrange the computations in the way you described you'll need to limit yourself to associative operators with an identity element. This is known as a monoid, and what you've implemented is essentially a parallel mconcat
.
You can't, not exactly, because you have to depend on the property that you can split chunks. This means that, of course, you have to add the extra type restriction... The special case is if you have f :: a -> a -> a
as your accumulating function, and f
is associative.
Hence you would have to provide two functions, the one used in the chunks, and the one used to fold the chunk results. Your original version would just be a join on this function.
parfoldr :: NFData a => (a -> a -> a) -> a -> [a] -> a
parfoldr f = join $ parfoldr' f f
parfoldr' :: NFData b => (a -> b -> b) -> (b -> c -> c) -> b -> c -> [a] -> c
parfoldr' f g y z [] = z
parfoldr' f g y z xs = foldr g z partsRs
where parts = chunk listChunkSize xs
partsRs = map (foldr f y) parts `using` parList rdeepseq
Example 2 would then be
parfoldr' (\i acc -> (i>5) && acc) (&&) True True [1..10]
All in all, this isn't that much uglier.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With