Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert Between Latin1-encoded Data.ByteString and Data.Text

Since the latin-1 (aka ISO-8859-1) character set is embedded in the Unicode character set as its lowest 256 code-points, I'd expect the conversion to be trivial, but I didn't see any latin-1 encoding conversion functions in Data.Text.Encoding which contains only conversion functions for the common UTF encodings.

What's the recommended and/or efficient way to convert between Data.ByteString values encoded in latin-1 representation and Data.Text values?

like image 459
hvr Avatar asked Sep 25 '11 10:09

hvr


1 Answers

The answer is right at the top of the page you linked:

To gain access to a much larger family of encodings, use the text-icu package: http://hackage.haskell.org/package/text-icu

A quick GHCi example:

λ> import Data.Text.ICU.Convert
λ> conv <- open "ISO-8859-1" Nothing
λ> Data.Text.IO.putStrLn $ toUnicode conv $ Data.ByteString.pack [198, 216, 197]
ÆØÅ
λ> Data.ByteString.unpack $ fromUnicode conv $ Data.Text.pack "ÆØÅ"
[198,216,197]

However, as you pointed out, in the specific case of latin-1, the code points coincide with Unicode, so you can use pack/unpack from Data.ByteString.Char8 to perform the trivial mapping from latin-1 from/to String, which you can then convert to Text using the corresponding pack/unpack from Data.Text.

like image 76
hammar Avatar answered Sep 25 '22 15:09

hammar