I'm writing a toy implementation of a rainbow table in Haskell. The main datastructure is a strict <code>Map h c</code>, containing a large amount of pairs, generated from random values <code>c</code>: <pre class="prettyprint"><code>import qualified Data.Map as M import System.Random table :: (RandomGen g, Random c) => Int -> g -> Map h c table n = M.fromList . map (\c -> (chain c, c)) . take n . randoms </code></pre> where <code>chain</code> is very expensive to compute. The part that dominates the computation time is embarrassingly parallel, so I would expect to get a quasi-linear speedup in the number of cores if it runs in parallel. However, I would like the computed pairs to be added to the table straight away, rather than accumulated in a list in memory. It should be noted that collisions may occur, and in that case, the redundant chains should be dropped as soon as possible. Heap profiling confirms that this is the case. I've found <code>parMap</code> from <code>Control.Parallel.Strategies</code>, and tried to apply it to my table-building function: <pre class="prettyprint"><code>table n = M.fromList . parMap (evalTuple2 rseq rseq) (\c -> (chain c, c)) . take n . randoms </code></pre> but, running with <code>-N</code>, I get to 1.3 core usage at best. Heap profiling indicates, at least, that the intermediate list does not reside in memory, but '-s' also reports 0 sparks created. How is this possible with my usage of <code>parMap</code> ? What is the proper way to do this ? EDIT: <code>chain</code> is defined as: <pre class="prettyprint"><code>chain :: (c -> h) -> [h -> c] -> c -> h chain h = h . flip (foldl' (flip (.h))) </code></pre> where <code>(c -> h)</code> is the target hash function, from cleartext to hash, and <code>[h -> c]</code> is a family of reducer functions. I want the implementation to stay generic over <code>c</code> and <code>h</code>, but for benchmarking I use strict bytestrings for both.

Here is what I came up with. Let me know how the benchmarks work out: <pre class="prettyprint lang-hs prettyprint-override"><code>#!/usr/bin/env stack {- stack --resolver lts-14.1 script --optimize --package scheduler --package containers --package random --package splitmix --package deepseq -} {-# LANGUAGE BangPatterns #-} import Control.DeepSeq import Control.Scheduler import Data.Foldable as F import Data.IORef import Data.List (unfoldr) import Data.Map.Strict as M import System.Environment (getArgs) import System.Random as R import System.Random.SplitMix -- for simplicity chain :: Show a => a -> String chain = show makeTable :: Int -> SMGen -> (SMGen, M.Map String Int) makeTable = go M.empty where go !acc i gen | i > 0 = let (c, gen') = R.random gen in go (M.insert (chain c) c acc) (i - 1) gen' | otherwise = (gen, acc) makeTablePar :: Int -> SMGen -> IO (M.Map String Int) makeTablePar n0 gen0 = do let gens = unfoldr (Just . splitSMGen) gen0 gensState <- initWorkerStates Par (\(WorkerId wid) -> newIORef (gens !! wid)) tables <- withSchedulerWS gensState $ \scheduler -> do let k = numWorkers (unwrapSchedulerWS scheduler) (q, r) = n0 `quotRem` k forM_ ((if r == 0 then [] else [r]) ++ replicate k q) $ \n -> scheduleWorkState scheduler $ \genRef -> do gen <- readIORef genRef let (gen', table) = makeTable n gen writeIORef genRef gen' table `deepseq` pure table pure $ F.foldl' M.union M.empty tables main :: IO () main = do [n] <- fmap read <$> getArgs gen <- initSMGen print =<< makeTablePar n gen </code></pre> Few notes on implementation: <ul> <li>Don't use generator from <code>random</code>, it is hella slow, <code>splitmix</code> is x200 faster</li> <li>In <code>makeTable</code>, if you want duplicate results to be discarded right away, then manual loop or unfold is required. But since we need the generator returned, I opted for the manual loop.</li> <li>In order to minimize synchronization between threads, independent maps will be built up per thread, and at the end duplicates get removed, when resulting maps are merged together.</li> </ul>

Properly exploit parallelism when building a map of expensive keys?

Tags:

haskell

parallel-processing

I'm writing a toy implementation of a rainbow table in Haskell. The main datastructure is a strict Map h c, containing a large amount of pairs, generated from random values c:

import qualified Data.Map as M
import System.Random

table :: (RandomGen g, Random c) => Int -> g -> Map h c
table n = M.fromList . map (\c -> (chain c, c)) . take n . randoms

where chain is very expensive to compute. The part that dominates the computation time is embarrassingly parallel, so I would expect to get a quasi-linear speedup in the number of cores if it runs in parallel.

However, I would like the computed pairs to be added to the table straight away, rather than accumulated in a list in memory. It should be noted that collisions may occur, and in that case, the redundant chains should be dropped as soon as possible. Heap profiling confirms that this is the case.

I've found parMap from Control.Parallel.Strategies, and tried to apply it to my table-building function:

table n = M.fromList . parMap (evalTuple2 rseq rseq) (\c -> (chain c, c)) . take n . randoms

but, running with -N, I get to 1.3 core usage at best. Heap profiling indicates, at least, that the intermediate list does not reside in memory, but '-s' also reports 0 sparks created. How is this possible with my usage of parMap ? What is the proper way to do this ?

EDIT: chain is defined as:

chain :: (c -> h) -> [h -> c] -> c -> h
chain h = h . flip (foldl' (flip (.h)))

where (c -> h) is the target hash function, from cleartext to hash, and [h -> c] is a family of reducer functions. I want the implementation to stay generic over c and h, but for benchmarking I use strict bytestrings for both.

428

asked Aug 19 '19 12:08

b0fh

Video Answer

1 Answers

Here is what I came up with. Let me know how the benchmarks work out:

#!/usr/bin/env stack
{- stack --resolver lts-14.1 script --optimize
  --package scheduler
  --package containers
  --package random
  --package splitmix
  --package deepseq
-}
{-# LANGUAGE BangPatterns #-}

import Control.DeepSeq
import Control.Scheduler
import Data.Foldable as F
import Data.IORef
import Data.List (unfoldr)
import Data.Map.Strict as M
import System.Environment (getArgs)
import System.Random as R
import System.Random.SplitMix


-- for simplicity
chain :: Show a => a -> String
chain = show

makeTable :: Int -> SMGen -> (SMGen, M.Map String Int)
makeTable = go M.empty
  where go !acc i gen
          | i > 0 =
            let (c, gen') = R.random gen
            in go (M.insert (chain c) c acc) (i - 1) gen'
          | otherwise = (gen,  acc)

makeTablePar :: Int -> SMGen -> IO (M.Map String Int)
makeTablePar n0 gen0 = do
  let gens = unfoldr (Just . splitSMGen) gen0
  gensState <- initWorkerStates Par (\(WorkerId wid) -> newIORef (gens !! wid))
  tables <-
    withSchedulerWS gensState $ \scheduler -> do
      let k = numWorkers (unwrapSchedulerWS scheduler)
          (q, r) = n0 `quotRem` k
      forM_ ((if r == 0 then [] else [r]) ++ replicate k q) $ \n ->
        scheduleWorkState scheduler $ \genRef -> do
          gen <- readIORef genRef
          let (gen', table) = makeTable n gen
          writeIORef genRef gen'
          table `deepseq` pure table
  pure $ F.foldl' M.union M.empty tables

main :: IO ()
main = do
  [n] <- fmap read <$> getArgs
  gen <- initSMGen
  print =<< makeTablePar n gen

Few notes on implementation:

Don't use generator from random, it is hella slow, splitmix is x200 faster
In makeTable, if you want duplicate results to be discarded right away, then manual loop or unfold is required. But since we need the generator returned, I opted for the manual loop.
In order to minimize synchronization between threads, independent maps will be built up per thread, and at the end duplicates get removed, when resulting maps are merged together.

119

answered Nov 15 '22 08:11

lehins

Related questions
                            
                                Memoizing arguments independently
                            
                                Purpose of Data.Vector.Mixed
                            
                                Examining the binding structure in a free monad AST
                            
                                How to fix space leak caused by laziness when your algorithm relies on laziness
                            
                                GHC RTS runtime errors when using hs_init with profiling of shared cabal library
                            
                                Generate dynamic Name in template haskell using the current scope
                            
                                Good state-of-the-art examples of Arrows in action?
                            
                                Primitive but efficient grep clone in haskell?
                            
                                GHCI Segfault When Creating a Database Connection with postgresql-simple
                            
                                A haskell "withSubprocess" construct around a do block
                            
                                Constructors with variable number of arguments
                            
                                Extracting STG of Haskell Source
                            
                                Is there a way to profile compilation time per module with GHC?
                            
                                How to create a limited version of the IO monad
                            
                                Has anyone been able to integrate liquidhaskell with nixos?
                            
                                What is the free monads vs mtl debate?
                            
                                Row polymorphic equality of type-level lists
                            
                                Coercible and existential
                            
                                How do I get a handle on deep stacks of functors in Haskell?
                            
                                When does cabal recompile a module which contains Template Haskell?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With