This is my code for counting lines and words:
import System.IO
import Data.List
main = do
hSetBinaryMode stdin True
interact $ (\(w,l)->"line:"++(show l)++"\nwords:"++(show w)++"\n")
. foldl' (\(w,l) r-> w `seq` l `seq` (w+length r ,succ l) ) (0,0)
. lines
This takes about 10 seconds to run on a file of about 100 megabytes. I compared it to similar programs in Lua (9s), awk (20s), and wc -l -c
(0.6s).
Why is this code so slow? What could be the problem?
I/O with String
is known to be less than fast in Haskell. The bytes read from the handle in general have to be converted to Unicode code points, and then a linked list is built from those. That's a lot of work causing a lot of allocation. In this case, the conversion to code points is a bit simpler, since you set stdin to binary mode, but the construction of the linked list of characters still takes a lot of time.
Another small factor is that your line count is using Integer
, but that's minor and only plays a significant role when the I/O is up to speed.
If you need fast I/O, you have to use a type better suited for that. One possibility is using ByteString
, for example
import Data.List
import qualified Data.ByteString.Lazy.Char8 as C
main = do
txt <- C.getContents
putStrLn $ (\(w,l)->"line:"++(show l)++"\nwords:"++(show w)++"\n"). foldl' (\(w,l) r-> w `seq` l `seq` (w+C.length r ,succ l) ) (0,0) . C.lines $ txt
does the job on a 94MB file in 0.12s on my box (wc -l -c takes 0.06s), while the original using String
took 4.4s. It can be optimised further,
{-# LANGUAGE BangPatterns #-}
import Data.List
import qualified Data.ByteString.Lazy.Char8 as C
main = do
txt <- C.getContents
putStrLn $ (\(w,l)->"line:"++(show l)++"\nwords:"++(show w)++"\n"). loop 0 0 . C.lines $ txt
loop :: Int -> Int -> [C.ByteString] -> (Int,Int)
loop !w !l (ln:lns) = loop (w + fromIntegral (C.length ln)) (l+1) lns
loop w l _ = (w,l)
takes only 0.08s, which is decent enough for me to stop optimising there (a similar change for the String
version brings the time down to 3.6s for that).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With