I'm trying to wrap my head around parallel strategies. I think I understand what each of the combinators do, but every time I try using them with more than 1 core, the program slows considerably.
For example a while back I tried to calculate histograms (and from them unique words) from ~700 documents. I thought that using file level granularity would be ok. With -N4
I get 1.70 work balance. However with -N1
it runs in half the time than it does with -N4
. I'm not sure what the question is really, but I'd like to know how to decide where/when/how to parallelize and gain some understanding on it. How would this be parallelized so that the speed increases with cores instead of decreasing?
import Data.Map (Map)
import qualified Data.Map as M
import System.Directory
import Control.Applicative
import Data.Vector (Vector)
import qualified Data.Vector as V
import qualified Data.Text as T
import qualified Data.Text.IO as TI
import Data.Text (Text)
import System.FilePath ((</>))
import Control.Parallel.Strategies
import qualified Data.Set as S
import Data.Set (Set)
import GHC.Conc (pseq, numCapabilities)
import Data.List (foldl')
mapReduce stratm m stratr r xs = let
mapped = parMap stratm m xs
reduced = r mapped `using` stratr
in mapped `pseq` reduced
type Histogram = Map Text Int
rootDir = "/home/masse/Documents/text_conversion/"
finnishStop = ["minä", "sinä", "hän", "kuitenkin", "jälkeen", "mukaanlukien", "koska", "mutta", "jos", "kuitenkin", "kun", "kunnes", "sanoo", "sanoi", "sanoa", "miksi", "vielä", "sinun"]
englishStop = ["a","able","about","across","after","all","almost","also","am","among","an","and","any","are","as","at","be","because","been","but","by","can","cannot","could","dear","did","do","does","either","else","ever","every","for","from","get","got","had","has","have","he","her","hers","him","his","how","however","i","if","in","into","is","it","its","just","least","let","like","likely","may","me","might","most","must","my","neither","no","nor","not","of","off","often","on","only","or","other","our","own","rather","said","say","says","she","should","since","so","some","than","that","the","their","them","then","there","these","they","this","tis","to","too","twas","us","wants","was","we","were","what","when","where","which","while","who","whom","why","will","with","would","yet","you","your"]
isStopWord :: Text -> Bool
isStopWord x = x `elem` (finnishStop ++ englishStop)
textFiles :: IO [FilePath]
textFiles = map (rootDir </>) . filter (not . meta) <$> getDirectoryContents rootDir
where meta "." = True
meta ".." = True
meta _ = False
histogram :: Text -> Histogram
histogram = foldr (\k -> M.insertWith' (+) k 1) M.empty . filter (not . isStopWord) . T.words
wordList = do
files <- mapM TI.readFile =<< textFiles
return $ mapReduce rseq histogram rseq reduce files
where
reduce = M.unions
main = do
list <- wordList
print $ M.size list
As for the text files, I'm using pdfs converted to text files so I can't provide them, but for the purpose, almost any book/books from project gutenberg should do.
Edit: Added imports to script
In practice, getting the parallel combinators to scale well can be difficult. Others have mentioned making your code more strict to ensure you are actually doing the work in parallel, which is definitely important.
Two things that can really kill performance are lots of memory traversal and
garbage collections. Even if you are not producing a lot of garbage, lots of
memory traversals put more pressure on the CPU cache and eventually your
memory bus becomes the bottle neck. Your isStopWord
function performs a lot
of string comparisons and has to traverse a rather long linked list to do so.
You can save a lot of work using the builtin Set
type or, even better, the
HashSet
type from the unordered-containers
package (since repeated string
comparisons can be expensive, especially if they share commons prefixes).
import Data.HashSet (HashSet)
import qualified Data.HashSet as S
...
finnishStop :: [Text]
finnishStop = ["minä", "sinä", "hän", "kuitenkin", "jälkeen", "mukaanlukien", "koska", "mutta", "jos", "kuitenkin", "kun", "kunnes", "sanoo", "sanoi", "sanoa", "miksi", "vielä", "sinun"]
englishStop :: [Text]
englishStop = ["a","able","about","across","after","all","almost","also","am","among","an","and","any","are","as","at","be","because","been","but","by","can","cannot","could","dear","did","do","does","either","else","ever","every","for","from","get","got","had","has","have","he","her","hers","him","his","how","however","i","if","in","into","is","it","its","just","least","let","like","likely","may","me","might","most","must","my","neither","no","nor","not","of","off","often","on","only","or","other","our","own","rather","said","say","says","she","should","since","so","some","than","that","the","their","them","then","there","these","they","this","tis","to","too","twas","us","wants","was","we","were","what","when","where","which","while","who","whom","why","will","with","would","yet","you","your"]
stopWord :: HashSet Text
stopWord = S.fromList (finnishStop ++ englishStop)
isStopWord :: Text -> Bool
isStopWord x = x `S.member` stopWord
Replacing your isStopWord
function with this version performs much better
and scales much better (though definitely not 1-1). You could also consider
using HashMap
(from the same package) rather than Map
for the same reasons,
but I did not get a noticeable change from doing so.
Another option is to increase the default heap size to take some of the
pressure off the GC and to give it more room to move things around. Giving the
compiled code a default heap size of 1GB (-H1G
flag), I get a GC balance of
about 50% on 4 cores, whereas I only get ~25% without (it also runs ~30%
faster).
With these two alterations, the average runtime on four cores (on my machine) drops from ~10.5s to ~3.5s. Arguably, there is room for improvement based on the GC statistics (still only spends 58% of the time doing productive work), but doing significantly better might require a much more drastic change to your algorithm.
I think Daniel got it right -- the Data.Map and lists are a lazy data structures; you should use both foldl' and insertWith' to ensure work for each chunk is done eagerly --- otherwise all work is delayed to the sequential part (reduce).
Also it is not obvious that making a spark for each file is the right granularity, particularly if file sizes differ substantially. If that can be the case, it would be preferable to concatenate the word lists and split in even-sized chunks (see parListChunk combinator).
While you're at it, I would also look at some pitfalls of using lazy IO (readFile) to open many files (the runtime systme might run out of file handles because it holds onto them for too long).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With