I've been trying to practice with the Digits-Recognizer Dojo in Haskell after having done it in F#. I'm getting results, but for some reason my Haskell code is insanely slow, and I cannot seem to find what's wrong.
Here is my code (the .csv
files can be found on the Dojo's GitHub):
import Data.Char
import Data.List
import Data.List.Split
import Data.Ord
import System.IO
type Pixels = [Int]
data Digit = Digit { label :: Int, pixels :: Pixels }
distance :: Pixels -> Pixels -> Float
distance d1 d2 = sqrt . sum $ map pointDistance $ zip d1 d2
where pointDistance (a, b) = fromIntegral $ (a - b) * (a - b)
parseDigit :: String -> Digit
parseDigit s = Digit label pixels
where (label:pixels) = map read $ splitOn "," s
identify :: Digit -> [Digit] -> (Digit, Float)
identify digit training = minimumBy (comparing snd) distances
where distances = map fn training
fn ref = (ref, distance (pixels digit) (pixels ref))
readDigits :: String -> IO [Digit]
readDigits filename = do
fileContent <- readFile filename
return $ map parseDigit $ tail $ lines fileContent
main :: IO ()
main = do
trainingSample <- readDigits "trainingsample.csv"
validationSample <- readDigits "validationsample.csv"
let result = [(d, identify d trainingSample) | d <- validationSample]
fmt (d, (ref, dist)) = putStrLn $ "Found..."
mapM_ fmt result
What would be the reason of these bad performances?
[UPDATE] Thank you for your many ideas! I have switched my usage of String
to Data.Text
and my usage of List
to Data.Vector
as suggested, unfortunately the result is still far from satisfactory.
My updated code is available here.
To give you a better understanding of my interrogation, here's the output of my Haskell (left) and F# (right) implementation. I'm a total newbie of both languages, so I sincerely believe that there has to be a major mistake in my Haskell version to be that much slower.
If you're patient, you'll notice that the second result is calculated much faster than the first. That's because your implementation takes some time to read in the csv files.
You may be tempted to stick a print statement to see when it's done loading like so:
main = do
trainingSample <- readDigits "trainingsample.csv"
validationSample <- readDigits "validationsample.csv"
putStrLn "done loading data"
But due to lazyIO, this won't do what you think it does. trainingSample
and validationSample
are not yet fully evaluated. So your print statement will print almost immediately, and the first result will still take forever.
You can force readDigits
to fully evaluate their return values, though, which will give you a better idea of how much time is spent there. You could either switch to using non-lazy IO, or just print something derived from the data:
readDigits :: String -> IO [Digit]
readDigits filename = do
fileContent <- readFile filename
putStr' $ filename ++ ": "
rows <- forM (tail $ lines fileContent) $ \line -> do
let xs = parseDigit line
putStr' $ case compare (sum $ pixels xs) 0 of
LT -> "-"
EQ -> "0"
GT -> "+"
return xs
putStrLn ""
return rows
where putStr' s = putStr s >> hFlush stdout
On my machine, this let me see that it took about 27 seconds to fully read the digits from trainingsample.csv
.
This is printf-style profiling, which isn't great (much better to use a real profiler, or use criterion to benchmark various parts of your code), but good enough for these purposes.
That's clearly a major part of the slowdown, so it's worth trying to switch to strict io. Using Data.Text.IO.readFile
, which is strict, cut it down to ~18 seconds.
UPDATE
Here's how to speed up your updated code:
Use unboxed vectors for Pixels
(small win):
import qualified Data.Vector.Unboxed as U
-- ...
type Pixels = U.Vector Int
-- ...
distance :: Pixels -> Pixels -> Float
distance d1 d2 = sqrt . U.sum $ U.zipWith pointDistance d1 d2
where pointDistance a b = fromIntegral $ (a - b) * (a - b)
parseDigit :: T.Text -> Digit
parseDigit s = Digit label (U.fromList pixels)
where (label:pixels) = map toDigit $ T.splitOn (T.pack ",") s
toDigit s = either (\_ -> 0) fst (T.Read.decimal s)
Force the distance evaluation early by using seq
(big win):
identify :: Digit -> V.Vector Digit -> (Digit, Float)
identify digit training = V.minimumBy (comparing snd) distances
where distances = V.map fn training
fn ref = let d = distance (pixels digit) (pixels ref) in d `seq` (ref, d)
On my machine, the whole program now runs in ~5s:
% ghc --make -O2 Main.hs
[1 of 1] Compiling Main ( Main.hs, Main.o )
Linking Main ...
% time ./Main
./Main 5.00s user 0.11s system 99% cpu 5.115 total
The thunks were killing you.
Your Vector's version, partially unboxed, adapted for ByteString and compiled with -O2 -fllvm
runs in 8 seconds on my machine:
import Data.Ord
import Data.Maybe
import qualified Data.Vector as V
import qualified Data.Vector.Unboxed as U
import qualified Data.ByteString as B
import qualified Data.ByteString.Char8 as BC
type Pixels = U.Vector Int
data Digit = Digit { label :: !Int, pixels :: !Pixels }
distance :: Pixels -> Pixels -> Float
distance d1 d2 = sqrt . U.sum . U.zipWith pointDistance d1 $ d2
where pointDistance a b = fromIntegral $ (a - b) * (a - b)
parseDigit :: B.ByteString -> Digit
parseDigit bs =
let (label:pixels) = toIntegers bs []
in Digit label (U.fromList pixels)
where
toIntegers bs is =
let Just (i,bs') = BC.readInt bs
in if B.null bs' then reverse is else toIntegers (BC.tail bs') (i:is)
identify :: Digit -> V.Vector Digit -> (Digit, Float)
identify digit training = V.minimumBy (comparing snd) distances
where distances = V.map fn training
fn ref = (ref, distance (pixels digit) (pixels ref))
readDigits :: String -> IO (V.Vector Digit)
readDigits filename = do
fileContent <- B.readFile filename
return . V.map parseDigit . V.fromList . tail . BC.lines $ fileContent
main :: IO ()
main = do
trainingSample <- readDigits "trainingsample.csv"
validationSample <- readDigits "validationsample.csv"
let result = V.map (\d -> (d, identify d trainingSample)) validationSample
fmt (d, (ref, dist)) = putStrLn $ "Found " ++ show (label ref) ++ " for " ++ show (label d) ++ " (distance=" ++ show dist ++ ")"
V.mapM_ fmt result
Output of +RTS -s
:
989,632,984 bytes allocated in the heap
19,875,368 bytes copied during GC
31,016,504 bytes maximum residency (5 sample(s))
22,748,608 bytes maximum slop
78 MB total memory in use (1 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 1761 colls, 0 par 0.05s 0.05s 0.0000s 0.0008s
Gen 1 5 colls, 0 par 0.00s 0.02s 0.0030s 0.0085s
INIT time 0.00s ( 0.00s elapsed)
MUT time 7.42s ( 7.69s elapsed)
GC time 0.05s ( 0.06s elapsed)
EXIT time 0.00s ( 0.01s elapsed)
Total time 7.47s ( 7.77s elapsed)
%GC time 0.7% (0.8% elapsed)
Alloc rate 133,419,569 bytes per MUT second
Productivity 99.3% of total user, 95.5% of total elapsed
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With