Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Abnormally slow Haskell code

Tags:

haskell

I've been trying to practice with the Digits-Recognizer Dojo in Haskell after having done it in F#. I'm getting results, but for some reason my Haskell code is insanely slow, and I cannot seem to find what's wrong.

Here is my code (the .csv files can be found on the Dojo's GitHub):

import Data.Char
import Data.List
import Data.List.Split
import Data.Ord
import System.IO

type Pixels = [Int]

data Digit = Digit { label  :: Int, pixels :: Pixels }

distance :: Pixels -> Pixels -> Float
distance d1 d2 = sqrt . sum $ map pointDistance $ zip d1 d2
    where pointDistance (a, b) = fromIntegral $ (a - b) * (a - b)

parseDigit :: String -> Digit
parseDigit s = Digit label pixels
    where (label:pixels) = map read $ splitOn "," s

identify :: Digit -> [Digit] -> (Digit, Float)
identify digit training = minimumBy (comparing snd) distances
    where distances = map fn training
          fn ref    = (ref, distance (pixels digit) (pixels ref))

readDigits :: String -> IO [Digit]
readDigits filename = do
    fileContent <- readFile filename
    return $ map parseDigit $ tail $ lines fileContent

main :: IO ()
main = do
    trainingSample <- readDigits "trainingsample.csv"
    validationSample <- readDigits "validationsample.csv"
    let result               = [(d, identify d trainingSample) | d <- validationSample]
        fmt (d, (ref, dist)) = putStrLn $ "Found..."
    mapM_ fmt result

What would be the reason of these bad performances?


[UPDATE] Thank you for your many ideas! I have switched my usage of String to Data.Text and my usage of List to Data.Vector as suggested, unfortunately the result is still far from satisfactory.

My updated code is available here.

To give you a better understanding of my interrogation, here's the output of my Haskell (left) and F# (right) implementation. I'm a total newbie of both languages, so I sincerely believe that there has to be a major mistake in my Haskell version to be that much slower.

Terminal capture

like image 826
icecrime Avatar asked Jul 09 '14 22:07

icecrime


2 Answers

If you're patient, you'll notice that the second result is calculated much faster than the first. That's because your implementation takes some time to read in the csv files.

You may be tempted to stick a print statement to see when it's done loading like so:

main = do
    trainingSample <- readDigits "trainingsample.csv"
    validationSample <- readDigits "validationsample.csv"
    putStrLn "done loading data"

But due to lazyIO, this won't do what you think it does. trainingSample and validationSample are not yet fully evaluated. So your print statement will print almost immediately, and the first result will still take forever.

You can force readDigits to fully evaluate their return values, though, which will give you a better idea of how much time is spent there. You could either switch to using non-lazy IO, or just print something derived from the data:

readDigits :: String -> IO [Digit]
readDigits filename = do
    fileContent <- readFile filename
    putStr' $ filename ++ ": "
    rows <- forM (tail $ lines fileContent) $ \line -> do
      let xs = parseDigit line
      putStr' $ case compare (sum $ pixels xs) 0 of
                LT -> "-"
                EQ -> "0"
                GT -> "+"
      return xs
    putStrLn ""
    return rows
  where putStr' s = putStr s >> hFlush stdout

On my machine, this let me see that it took about 27 seconds to fully read the digits from trainingsample.csv.

This is printf-style profiling, which isn't great (much better to use a real profiler, or use criterion to benchmark various parts of your code), but good enough for these purposes.

That's clearly a major part of the slowdown, so it's worth trying to switch to strict io. Using Data.Text.IO.readFile, which is strict, cut it down to ~18 seconds.


UPDATE

Here's how to speed up your updated code:

  1. Use unboxed vectors for Pixels (small win):

    import qualified Data.Vector.Unboxed as U
    -- ...
    
    type Pixels = U.Vector Int
    -- ...
    
    distance :: Pixels -> Pixels -> Float
    distance d1 d2 = sqrt . U.sum $ U.zipWith pointDistance d1 d2
        where pointDistance a b = fromIntegral $ (a - b) * (a - b)
    
    parseDigit :: T.Text -> Digit
    parseDigit s = Digit label (U.fromList pixels)
        where (label:pixels) = map toDigit $ T.splitOn (T.pack ",") s
              toDigit s      = either (\_ -> 0) fst (T.Read.decimal s)
    
  2. Force the distance evaluation early by using seq (big win):

    identify :: Digit -> V.Vector Digit -> (Digit, Float)
    identify digit training = V.minimumBy (comparing snd) distances
        where distances = V.map fn training
              fn ref    = let d = distance (pixels digit) (pixels ref) in d `seq` (ref, d)
    

On my machine, the whole program now runs in ~5s:

% ghc --make -O2 Main.hs
[1 of 1] Compiling Main             ( Main.hs, Main.o )
Linking Main ...
% time ./Main
./Main  5.00s user 0.11s system 99% cpu 5.115 total

The thunks were killing you.

like image 168
rampion Avatar answered Oct 21 '22 22:10

rampion


Your Vector's version, partially unboxed, adapted for ByteString and compiled with -O2 -fllvm runs in 8 seconds on my machine:

import Data.Ord
import Data.Maybe
import qualified Data.Vector as V
import qualified Data.Vector.Unboxed as U
import qualified Data.ByteString as B
import qualified Data.ByteString.Char8 as BC


type Pixels = U.Vector Int

data Digit = Digit { label :: !Int, pixels :: !Pixels }


distance :: Pixels -> Pixels -> Float
distance d1 d2 = sqrt . U.sum . U.zipWith pointDistance d1 $ d2
    where pointDistance a b = fromIntegral $ (a - b) * (a - b)

parseDigit :: B.ByteString -> Digit
parseDigit bs =
    let (label:pixels) = toIntegers bs []
    in Digit label (U.fromList pixels)
    where
      toIntegers bs is =
          let Just (i,bs') = BC.readInt bs
          in if B.null bs' then reverse is else toIntegers (BC.tail bs') (i:is)               

identify :: Digit -> V.Vector Digit -> (Digit, Float)
identify digit training = V.minimumBy (comparing snd) distances
    where distances = V.map fn training
          fn ref    = (ref, distance (pixels digit) (pixels ref))

readDigits :: String -> IO (V.Vector Digit)
readDigits filename = do
    fileContent <- B.readFile filename
    return . V.map parseDigit . V.fromList . tail . BC.lines $ fileContent


main :: IO ()
main = do
    trainingSample <- readDigits "trainingsample.csv"
    validationSample <- readDigits "validationsample.csv"
    let result = V.map (\d -> (d, identify d trainingSample)) validationSample
        fmt (d, (ref, dist)) = putStrLn $ "Found " ++ show (label ref) ++ " for " ++ show (label d) ++ " (distance=" ++ show dist ++ ")"
    V.mapM_ fmt result

Output of +RTS -s:

     989,632,984 bytes allocated in the heap
      19,875,368 bytes copied during GC
      31,016,504 bytes maximum residency (5 sample(s))
      22,748,608 bytes maximum slop
              78 MB total memory in use (1 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      1761 colls,     0 par    0.05s    0.05s     0.0000s    0.0008s
  Gen  1         5 colls,     0 par    0.00s    0.02s     0.0030s    0.0085s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    7.42s  (  7.69s elapsed)
  GC      time    0.05s  (  0.06s elapsed)
  EXIT    time    0.00s  (  0.01s elapsed)
  Total   time    7.47s  (  7.77s elapsed)

  %GC     time       0.7%  (0.8% elapsed)

  Alloc rate    133,419,569 bytes per MUT second

  Productivity  99.3% of total user, 95.5% of total elapsed
like image 33
Ed'ka Avatar answered Oct 21 '22 22:10

Ed'ka