I want to process a binary file that is too large to read into memory. Currently I use ByteString.Lazy.readFile to stream the bytes. I thought it would be a good idea to use the streaming package to make my program faster. However, the documentation for readFile
says:
readFile :: FilePath -> (Stream (Of String) IO () -> IO a) -> IO a
Read the lines of a file, using a function of the type: 'Stream (Of String) IO () -> IO a' to turn the stream into a value of type 'IO a'.
So the streaming
package only reads ASCII text files? Can I use this package to read a binary file as bytes?
To elaborate on @Cubic's comment, while there's a general consensus that lazy I/O should be avoided in production code and replaced with a streaming approach, this is not directly related to performance. If you're writing a program to do some one-off processing of a large file, as long as you have a lazy I/O version running fine now, there's probably no good performance reason to convert it over to a streaming package.
In fact, streaming is more likely to add some overhead, so I suspect that a well optimized lazy I/O solution would out-perform a well optimized streaming solution, in most cases.
The main reasons for avoiding Lazy I/O have been previously discussed on SO. In a nutshell, lazy I/O makes it difficult to consistently manage resources (e.g., file handles and network sockets), makes it hard to reason about space usage (e.g., a small program change can cause your memory usage to explode), and is occasionally "unsafe" if the timing and ordering of the I/O in question matters (usually not a problem if you're just reading in one set of files and/or writing out another set of files).
Short-running utility programs for reading and/or writing large files are probably good candidates to be written in a lazy I/O style. As long as they don't have any obvious space leaks when they're run, they're probably fine.
With only streaming and bytestring, one can write something like:
import Data.ByteString
import Streaming
import qualified Streaming.Prelude as S
import System.IO
fromHandle :: Int -> Handle -> Stream (Of ByteString) IO ()
fromHandle chunkSize h =
S.untilRight $ do bytes <- Data.ByteString.hGet h chunkSize
pure $ if Data.ByteString.null bytes then Right ()
else Left bytes
Using hGet
, null
from bytestring, and untilRight
from streaming. You will need to use withFile
to get the Handle
, and consume the Stream
within the callback:
dump :: FilePath -> IO ()
dump file = withFile file ReadMode go
where
go :: Handle -> IO ()
go = S.mapM_ (Data.ByteString.hPut stdout) . fromHandle 4096
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With