I'm trying to port some Java to Go. The Java code has a character variable with the value '\ud83f'. When I try to use this value in Go, it doesn't compile:
package main
func main() {
c := '\ud83f'
println(c)
}
$ go run a.go
# command-line-arguments
./a.go:3: invalid Unicode code point in escape sequence: 0xd83f
Why? I also tried making a string with that value in Python and it worked too. It's just not working in Go for some reason.
That rune literal you tried to use is invalid because it denotes a surrogate code point. The spec says rune literals cannot denote a surrogate code point ("as well as others" (which?)):
Rune Literals
[...]
The escapes \u and \U represent Unicode code points so within them some values are illegal, in particular those above 0x10FFFF and surrogate halves.
Further below in the examples, you can see another case which is deemed illegal:
'\U00110000' // illegal: invalid Unicode code point
Which seems to imply that invalid code points (such as those above 10ffff) are also illegal in rune literals.
Note that since rune is merely an alias for int32, you can simply do:
var r rune = 0xd8f3
instead of
var r rune = '\ud8f3'
And if you wanted to get a number above 10FFFF you could do
var r rune = 0x11ffff
instead of
var r rune = '\U0011ffff'
Already being mentioned, \ud83f is part of a surrogate half, used in UTF-16 encoding.
This is not considered a valid code point, and the Go specification explicitly states:
The escapes \u and \U represent Unicode code points so within them some values are illegal, in particular those above 0x10FFFF and surrogate halves.
If you want a rune with this invalid code point, you can do the following:
c := rune(0xd83f)
But, the correct way to handling such a value is to first decode the two surrogate halves, then using the resulting valid code point.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With