Let's say I have a text file like this.
\u0053
\u0075
\u006E
Is there a way I can convert that to this?
S
u
n
Currently, I'm using ioutil.ReadFile("data.txt")
, but when I print the data, I get the unicode code points instead of the string literals. I realize this is the correct behavior for ReadFile
, it's just not want I want.
I'm aiming for a substitution of the code points with their literal characters.
You CAN'T convert from Unicode to ASCII. Almost every character in Unicode cannot be expressed in ASCII, and those that can be expressed have exactly the same codepoints in ASCII as in UTF-8, which is probably what you have.
World's simplest unicode tool. This browser-based utility converts fancy Unicode text back to regular text. All Unicode glyphs that you paste or enter in the text area as the input automatically get converted to simple ASCII characters in the output.
Strings can be created by enclosing a set of characters inside double quotes " " . Let's look at a simple example that creates a string and prints it. The above program will print Hello World . Strings in Go are Unicode compliant and are UTF-8 Encoded.
In short, Go source code is UTF-8, so the source code for the string literal is UTF-8 text.
You can use the strconv.Unquote()
and strconv.UnquoteChar()
functions to do the conversion.
One thing you should be aware of is that strconv.Unquote()
can only unquote strings that are in quotes (e.g. start and end with a quote char "
or a back quote char `
), so we have to manually append that.
See this example:
lines := []string{
`\u0053`,
`\u0075`,
`\u006E`,
}
fmt.Println(lines)
for i, v := range lines {
var err error
lines[i], err = strconv.Unquote(`"` + v + `"`)
if err != nil {
fmt.Println(err)
}
}
fmt.Println(lines)
fmt.Println(strconv.Unquote(`"Go\u0070\x68\x65\x72"`))
Output (try it on the Go Playground):
[\u0053 \u0075 \u006E]
[S u n]
Gopher <nil>
If the strings you want to unquote contain the escape sequence of a single rune
(or you just want to unquote the first rune
), you may use strconv.UnquoteChar()
. This is how it looks like (note: no quoting of the input is needed in this case, like it was needed for strconv.Unquote()
):
runes := []string{
`\u0053`,
`\u0075`,
`\u006E`,
}
fmt.Println(runes)
for _, v := range runes {
var err error
value, _, _, err := strconv.UnquoteChar(v, 0)
if err != nil {
fmt.Println(err)
}
fmt.Printf("%c\n", value)
}
This will output (try it on the Go Playground):
[\u0053 \u0075 \u006E]
S
u
n
A slightly different approach is using strconv.ParseInt
, this generates less garbage and uses less internal logic (Unquote
does a lot of other checks) for parsing the lines:
for i, v := range lines {
if len(v) != 6 {
continue
}
if r, err := strconv.ParseInt(v[2:], 16, 32); err == nil {
lines[i] = string(r)
}
}
playground
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With