Invalid Unicode code point 0xd83f

Question

I'm trying to port some Java to Go. The Java code has a character variable with the value '\ud83f'. When I try to use this value in Go, it doesn't compile:

package main
func main() {
    c := '\ud83f'
    println(c)
}

$ go run a.go
# command-line-arguments
./a.go:3: invalid Unicode code point in escape sequence: 0xd83f

Why? I also tried making a string with that value in Python and it worked too. It's just not working in Go for some reason.

Harold R. Eason · Accepted Answer

That rune literal you tried to use is invalid because it denotes a surrogate code point. The spec says rune literals cannot denote a surrogate code point ("as well as others" (which?)):

Rune Literals

[...]

The escapes \u and \U represent Unicode code points so within them some values are illegal, in particular those above 0x10FFFF and surrogate halves.

Further below in the examples, you can see another case which is deemed illegal:

'\U00110000' // illegal: invalid Unicode code point

Which seems to imply that invalid code points (such as those above 10ffff) are also illegal in rune literals.

Note that since rune is merely an alias for int32, you can simply do:

var r rune = 0xd8f3

instead of

var r rune = '\ud8f3'

And if you wanted to get a number above 10FFFF you could do

var r rune = 0x11ffff

instead of

var r rune = '\U0011ffff'

ANisus · Answer

Already being mentioned, \ud83f is part of a surrogate half, used in UTF-16 encoding. This is not considered a valid code point, and the Go specification explicitly states:

The escapes \u and \U represent Unicode code points so within them some values are illegal, in particular those above 0x10FFFF and surrogate halves.

If you want a rune with this invalid code point, you can do the following:

c := rune(0xd83f)

But, the correct way to handling such a value is to first decode the two surrogate halves, then using the resulting valid code point.

Invalid Unicode code point 0xd83f

Tags:

unicode

go

Dog

2 Answers

Harold R. Eason

ANisus

Recent Activity

Donate For Us

Invalid Unicode code point 0xd83f

Tags:

unicode

go

Dog

2 Answers

Harold R. Eason

ANisus

Related questions

Recent Activity

Donate For Us