golang convert iso8859-1 to utf8

Question

I am trying to convert an ISO 8859-1 encoded string to UTF-8.

The following function works with my testdata which contains german umlauts, but I'm not quite sure what source encoding the rune(b) cast assumes. Is it assuming some kind of default encoding, e.g. ISO8859-1 or is there any way to tell it what encoding to use?

func toUtf8(iso8859_1_buf []byte) string {
   var buf = bytes.NewBuffer(make([]byte, len(iso8859_1_buf)*4))
   for _, b := range(iso8859_1_buf) {
      r := rune(b)
      buf.WriteRune(r)
   }
   return string(buf.Bytes())
}

ANisus · Accepted Answer

rune is an alias for int32, and when it comes to encoding, a rune is assumed to have a Unicode character value (code point). So the value b in rune(b) should be a unicode value. For 0x00 - 0xFF this value is identical to Latin-1, so you don't have to worry about it.

Then you need to encode the runes into UTF8. But this encoding is simply done by converting a []rune to string.

This is an example of your function without using the bytes package:

func toUtf8(iso8859_1_buf []byte) string {
    buf := make([]rune, len(iso8859_1_buf))
    for i, b := range iso8859_1_buf {
        buf[i] = rune(b)
    }
    return string(buf)
}

golang convert iso8859-1 to utf8

Tags:

character-encoding

go

zeroc8

Video Answer

1 Answers

ANisus

Recent Activity

Donate For Us

golang convert iso8859-1 to utf8

Tags:

character-encoding

go

zeroc8

Video Answer

1 Answers

ANisus

Related questions

Recent Activity

Donate For Us