I understand how Map is easily parallelizable - each computer/CPU can just operate on a small portion of the array.
Is Reduce/foldl parallelizable? It seems like each computation depends on the previous one. Is it just parallelizable for certain types of functions?
If your reduction underlying operation is associative*, you can play with the order of operations and locality. Therefore you often have a tree-like structure in the 'gather' phase, so you can do it in several passes in logarithmic time:
a + b + c + d
\ / \ /
(a+b) (c+d)
\ /
((a+b)+(c+d))
instead of (((a+b)+c)+d)
If your operation is commutative, further optimization are possible as you can gather in different order (it may be important for data alignment when those operations are vector operations for example)
[*] your real desired mathematical operations, not those on effective types like floats of course.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With