Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert from an encoding to UTF-8 in Go?

Tags:

unicode

go

I'm working on a project where I need to convert text from an encoding (for example Windows-1256 Arabic) to UTF-8.

How do I do this in Go?

like image 776
Ali Bahrami Avatar asked Sep 11 '15 07:09

Ali Bahrami


People also ask

How do I change my encoding to UTF-8?

UTF-8 Encoding in Notepad (Windows)Click File in the top-left corner of your screen. In the dialog which appears, select the following options: In the "Save as type" drop-down, select All Files. In the "Encoding" drop-down, select UTF-8.

How do I convert a file to UTF-8?

Click Tools, then select Web options. Go to the Encoding tab. In the dropdown for Save this document as: choose Unicode (UTF-8). Click Ok.

What is the default encoding for Golang?

For Go, UTF-8 is the default encoding for storing characters in a string.

Are Go strings UTF-8?

Go source code is always UTF-8. A string holds arbitrary bytes. A string literal, absent byte-level escapes, always holds valid UTF-8 sequences. Those sequences represent Unicode code points, called runes.


1 Answers

You can use the encoding package, which includes support for Windows-1256 via the package golang.org/x/text/encoding/charmap (in the example below, import this package and use charmap.Windows1256 instead of japanese.ShiftJIS).

Here's a short example which encodes a japanese UTF-8 string to ShiftJIS encoding and then decodes the ShiftJIS string back to UTF-8. Unfortunately it doesn't work on the playground since the playground doesn't have the "x" packages.

package main

import (
    "bytes"
    "fmt"
    "io/ioutil"
    "strings"

    "golang.org/x/text/encoding/japanese"
    "golang.org/x/text/transform"
)

func main() {
    // the string we want to transform
    s := "今日は"
    fmt.Println(s)

    // --- Encoding: convert s from UTF-8 to ShiftJIS 
    // declare a bytes.Buffer b and an encoder which will write into this buffer
    var b bytes.Buffer
    wInUTF8 := transform.NewWriter(&b, japanese.ShiftJIS.NewEncoder())
    // encode our string
    wInUTF8.Write([]byte(s))
    wInUTF8.Close()
    // print the encoded bytes
    fmt.Printf("%#v\n", b)
    encS := b.String()
    fmt.Println(encS)

    // --- Decoding: convert encS from ShiftJIS to UTF8
    // declare a decoder which reads from the string we have just encoded
    rInUTF8 := transform.NewReader(strings.NewReader(encS), japanese.ShiftJIS.NewDecoder())
    // decode our string
    decBytes, _ := ioutil.ReadAll(rInUTF8)
    decS := string(decBytes)
    fmt.Println(decS)
}

There's a more complete example on the Japanese StackOverflow site. The text is Japanese, but the code should be self-explanatory: https://ja.stackoverflow.com/questions/6120

like image 163
rob74 Avatar answered Sep 29 '22 12:09

rob74