Similar to this related question, I would like to perform a parallel map on a Vector, but in my case I have a nested Vector, and I can't seem to get the evaluation correct.
Here is what I have so far:
import qualified Data.Vector as V
import qualified Data.Vector.Unboxed as U
import Data.Vector.Strategies
import Control.DeepSeq
main = do
let res = genVVec 200 `using` parVector 2
print res
genUVec :: Int -> U.Vector Int
genUVec n = U.map (ack 2) $ U.enumFromN n 75
genVVec :: Int -> V.Vector (U.Vector Int)
genVVec n = V.map genUVec $ V.enumFromN 0 n
ack :: Int -> Int -> Int
ack 0 n = n+1
ack m 0 = ack (m-1) 1
ack m n = ack (m-1) (ack m (n-1))
instance (NFData a, U.Unbox a) => NFData (U.Vector a) where
rnf = rnf . U.toList
gives:
$ ./vectorPar +RTS -N8 -s >/dev/null
SPARKS: 200 (17 converted, 183 pruned)
Total time 1.37s ( 1.30s elapsed)
$ ./vectorPar +RTS -s >/dev/null
SPARKS: 200 (0 converted, 200 pruned)
Total time 1.25s ( 1.26s elapsed)
I have also tried modifying the parVector
function in vector-strategies directly, but my attempts are clumsy and ineffective.
I realize REPA was designed for nested workloads, but this seems a simple enough problem, and I'd rather not have to rewrite a lot of code.
Note: Guilty author of vector-strategies here (which is a very small title, seeing as this was just a hacked up function I figured others would find useful).
Your observation that parVector
is wrong in that it allows the sparks to be GCed prior to use seems to be correct. The advice by SimonM means I must do precisely what I was trying to avoid, construct a new vector, at some cost, in place of the old one. Knowing this is necessary, there is little reason not to change parVector
to the much simpler definition of:
parVector2 :: NFData a => Int -> Strategy (V.Vector a)
parVector2 n = liftM V.fromList . parListChunk n rdeepseq . V.toList
Notice the fix given by John L only works because it "beats" the collector by forcing the computations before collection would occur.
I'll be changing the vector-strategies library so this is unnecessary - making your original code work fine. Unfortunately, this will incur the above-mentioned cost of constructing a new Vector (usually minimal).
The problem appears to be that parVector
doesn't force evaluation of the elements of the vector. Each element remains a thunk and nothing is sparked until the vector is consumed (by being printed), which is too late for the sparks to do work. You can force evaluation of each element by composing the parVector
strategy with rdeepseq
.
import qualified Data.Vector as V
import qualified Data.Vector.Unboxed as U
import Data.Vector.Strategies
import Control.DeepSeq
import Control.Parallel.Strategies
main = do
let res = genVVec 200 `using` (rdeepseq `dot` parVector 20)
print res
genUVec :: Int -> U.Vector Int
genUVec n = U.map (ack 2) $ U.enumFromN n 75
genVVec :: Int -> V.Vector (U.Vector Int)
genVVec n = V.map genUVec $ V.enumFromN 0 n
ack :: Int -> Int -> Int
ack 0 n = n+1
ack m 0 = ack (m-1) 1
ack m n = ack (m-1) (ack m (n-1))
instance (NFData a, U.Unbox a) => NFData (U.Vector a) where
rnf vec = seq vec ()
instance (NFData a) => NFData (V.Vector a) where
rnf = rnf . V.toList
I also changed your NFData (U.Vector a)
instance. Since a U.Vector
is unboxed, evaluation to WHNF is sufficient, and forcing each element via the list conversion is wasteful. In fact the default definition for rnf
works fine if you like.
With these two changes, I get the following
SPARKS: 200 (200 converted, 0 pruned)
and the runtime has been reduced by nearly 50%. I also adjusted the vector chunk size to 20, but the result is very similar to a chunk size of 2.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With