The documentation for Data.ByteString.hGetContents says
As with hGet, the string representation in the file is assumed to be ISO-8859-1.
Why should it have to "assume" anything about the "string representation in the file"? The data is not necessarily strings or encoded text at all. If I wanted something to deal with encoded text I'd use Data.Text or perhaps Data.ByteString.Char8. I thought the whole point of ByteString is that the data is handled as a list of 8-bit bytes, not as text characters. What is the impact of the assumption that it is ISO-8859-1?
It's a roundabout way to say the same thing - no decoding is performed (since the encoding is 8-bit, nothing needs to be done), so hGetContents
gives you bytes in range 0x00 - 0xFF:
$ cat utf-8.txt
ÇÈÄ
$ iconv -f iso8859-1 iso8859-1.txt
ÇÈÄ
$ ghci
> openFile "iso8859-1.txt" ReadMode >>= (\h -> fmap BS.unpack $ BS.hGetContents h)
[199,200,196,10]
> openFile "utf-8.txt" ReadMode >>= (\h -> fmap BS.unpack $ BS.hGetContents h)
[195,135,195,136,195,132,10]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With