I have to parse a file, and indeed a have to read it first, here is my program :
import qualified Data.ByteString.Char8 as B
import System.Environment
main = do
args <- getArgs
let path = args !! 0
content <- B.readFile path
let lines = B.lines content
foobar lines
foobar :: [B.ByteString] -> IO()
foobar _ = return ()
but, after the compilation
> ghc --make -O2 tmp.hs
the execution goes through the following error when called with a 7Gigabyte file.
> ./tmp big_big_file.dat
> tmp: {handle: big_big_file.dat}: hGet: illegal ByteString size (-1501792951): illegal operation
thanks for any reply!
The length of ByteString
s are Int
. If Int
is 32 bits, a 7GB file will exceed the range of Int
and the buffer request will be for a wrong size and can easily request a negative size.
The code for readFile
converts the file size to Int
for the buffer request
readFile :: FilePath -> IO ByteString
readFile f = bracket (openBinaryFile f ReadMode) hClose
(\h -> hFileSize h >>= hGet h . fromIntegral)
and if that overflows, an "illegal ByteString size" error or a segmentation fault are the most likely outcomes.
If at all possible, use lazy ByteString
s to handle files that big. In your case, you pretty much have to make it possible, since with 32 bit Int
s, a 7GB ByteString
is impossible to create.
If you need the lines to be strict ByteString
s for the processing, and no line is exceedingly long, you can go through lazy ByteString
s to achieve that
import qualified Data.ByteString.Lazy.Char8 as LC
import qualified Data.ByteString.Char8 as C
main = do
...
content <- LC.readFile path
let llns = LC.lines content
slns = map (C.concat . LC.toChunks) llns
foobar slns
but if you can modify your processing to deal with lazy ByteString
s, that will probably be better overall.
Strict ByteString
s only support up to 2 GiB of memory. You need to use lazy ByteString
s for it to work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With