Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

golang convert iso8859-1 to utf8

I am trying to convert an ISO 8859-1 encoded string to UTF-8.

The following function works with my testdata which contains german umlauts, but I'm not quite sure what source encoding the rune(b) cast assumes. Is it assuming some kind of default encoding, e.g. ISO8859-1 or is there any way to tell it what encoding to use?

func toUtf8(iso8859_1_buf []byte) string {
   var buf = bytes.NewBuffer(make([]byte, len(iso8859_1_buf)*4))
   for _, b := range(iso8859_1_buf) {
      r := rune(b)
      buf.WriteRune(r)
   }
   return string(buf.Bytes())
}
like image 843
zeroc8 Avatar asked Nov 22 '12 10:11

zeroc8


Video Answer


1 Answers

rune is an alias for int32, and when it comes to encoding, a rune is assumed to have a Unicode character value (code point). So the value b in rune(b) should be a unicode value. For 0x00 - 0xFF this value is identical to Latin-1, so you don't have to worry about it.

Then you need to encode the runes into UTF8. But this encoding is simply done by converting a []rune to string.

This is an example of your function without using the bytes package:

func toUtf8(iso8859_1_buf []byte) string {
    buf := make([]rune, len(iso8859_1_buf))
    for i, b := range iso8859_1_buf {
        buf[i] = rune(b)
    }
    return string(buf)
}
like image 160
ANisus Avatar answered Sep 29 '22 03:09

ANisus