ByteString assumes ISO-8859-1?

Question

The documentation for Data.ByteString.hGetContents says

As with hGet, the string representation in the file is assumed to be ISO-8859-1.

Why should it have to "assume" anything about the "string representation in the file"? The data is not necessarily strings or encoded text at all. If I wanted something to deal with encoded text I'd use Data.Text or perhaps Data.ByteString.Char8. I thought the whole point of ByteString is that the data is handled as a list of 8-bit bytes, not as text characters. What is the impact of the assumption that it is ISO-8859-1?

Mikhail Glushenkov · Accepted Answer

It's a roundabout way to say the same thing - no decoding is performed (since the encoding is 8-bit, nothing needs to be done), so hGetContents gives you bytes in range 0x00 - 0xFF:

$ cat utf-8.txt
ÇÈÄ
$ iconv -f iso8859-1 iso8859-1.txt                         
ÇÈÄ
$ ghci
> openFile "iso8859-1.txt" ReadMode >>= (\h -> fmap BS.unpack $ BS.hGetContents h)
[199,200,196,10]
> openFile "utf-8.txt" ReadMode >>= (\h -> fmap BS.unpack $ BS.hGetContents h)
[195,135,195,136,195,132,10]

ByteString assumes ISO-8859-1?

Tags:

haskell

bytestring

Omari Norman

1 Answers

Mikhail Glushenkov

Recent Activity

Donate For Us

ByteString assumes ISO-8859-1?

Tags:

haskell

bytestring

Omari Norman

1 Answers

Mikhail Glushenkov

Related questions

Recent Activity

Donate For Us