It is a common knowledge that one does not use [Char]
to read large amounts of data in Haskell. One uses ByteString
s to do the job.
The usual explanation for this is that Char
s are large and lists add their overhead.
However, this does not seem to cause any problems with the output.
For example the following program:
main = interact $ const $ unwords $ map show $ replicate 500000 38000000
takes just 131 ms to run on my computer, while the following one:
import Data.List
sum' :: [Int] -> Int
sum' = foldl' (+) 0
main = interact $ show . sum' . map read . words
takes 3.38 seconds if fed the output of the first program as an input!
What is the reason for such a disparity between the input and output performance using String
s?
I don't think that this issue necessarily has to do with I/O. Rather, it demonstrates that the Read
instance for Int
is pretty inefficient.
First, consider the following program which just processes a lazy list. It takes 4.1s on my machine (compiled with -O2
):
main = print $ sum' $ map read $ words
$ unwords $ map show $ replicate 500000 38000000
Replacing the read
function with length
drops the time down to 0.48s:
main = print $ sum' $ map length $ words
$ unwords $ map show $ replicate 500000 38000000
Furthermore, replacing the read
function with a handwritten version results in a time of 0.52s:
main = print $ sum' $ map myread $ words
$ unwords $ map show $ replicate 500000 38000000
myread :: String -> Int
myread = loop 0
where
loop n [] = n
loop n (d:ds) = let d' = fromEnum d - fromEnum '0' :: Int
n' = 10 * n + d'
in loop n' ds
My guess as to why read
is so inefficient is that its implementation uses the Text.ParserCombinators.ReadP
module, which may not be the fastest choice for the simple case of reading a single integer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With