I read about IO buffering in the "Real World Haskell" (ch. 7, p. 189), and tried to test, how different buffering size affects the performance.
import System.IO
import Data.Time.Clock
import Data.Char(toUpper)
main :: IO ()
main = do
hInp <- openFile "bigFile.txt" ReadMode
let bufferSize = truncate $ 2**10
hSetBuffering hInp (BlockBuffering (Just bufferSize))
bufferMode <- hGetBuffering hInp
putStrLn $ "Current buffering mode: " ++ (show bufferMode)
startTime <- getCurrentTime
inp <- hGetContents hInp
writeFile "processed.txt" (map toUpper inp)
hClose hInp
finishTime <- getCurrentTime
print $ diffUTCTime finishTime startTime
return ()
Then I created a "bigFile.txt"
-rw-rw-r-- 1 user user 96M янв. 26 09:49 bigFile.txt
and run my program against this file, with different buffer size:
Current buffering mode: BlockBuffering (Just 32)
9.744967s
Current buffering mode: BlockBuffering (Just 1024)
9.667924s
Current buffering mode: BlockBuffering (Just 1048576)
9.494807s
Current buffering mode: BlockBuffering (Just 1073741824)
9.792453s
But the program running time is almost the same. Is it normal, or I'm doing something wrong?
On a modern OS it is likely that the buffer size has little effect on reading a file linearly due to 1) read-ahead performed by the kernel and 2) the file might already be in the page cache if you have already read the file recently.
Here is a program which measures the effect of buffering on writes. Typical results are:
$ ./mkbigfile 32 -- 12.864733s
$ ./mkbigfile 64 -- 9.668272s
$ ./mkbigfile 128 -- 6.993664s
$ ./mkbigfile 512 -- 4.130989s
$ ./mkbigfile 1024 -- 3.536652s
$ ./mkbigfile 16384 -- 3.055403s
$ ./mkbigfile 1000000 -- 3.004879s
Source:
{-# LANGUAGE OverloadedStrings #-}
import qualified Data.ByteString as BS
import Data.ByteString (ByteString)
import Control.Monad
import System.IO
import System.Environment
import Data.Time.Clock
main = do
(arg:_) <- getArgs
let size = read arg
let bs = "abcdefghijklmnopqrstuvwxyz"
n = 96000000 `div` (length bs)
h <- openFile "bigFile.txt" WriteMode
hSetBuffering h (BlockBuffering (Just size))
startTime <- getCurrentTime
replicateM_ n $ hPutStrLn h bs
hClose h
finishTime <- getCurrentTime
print $ diffUTCTime finishTime startTime
return ()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With