A rune is an alias to the int32 data type. It represents a Unicode code point. A Unicode code point or code position is a numerical value that is usually used to represent a Unicode character. The int32 is big enough to represent the current volume of 140,000 unicode characters.
It includes accents, diacritical marks, control codes like tab and carriage return, and assigns each character a standard number called “Unicode Code Point”, or in Go language, a “Rune”. The Rune type is an alias of int32. Important Points: Always remember, a string is a sequence of bytes and not of a Rune.
Type uint32 in Golang is the set of all unsigned 32-bit integers. The set ranges from 0 to 4294967295.
Hence in Go, all characters are represented in int32 (size of 4 bytes) data type.
I googled and found this
This has been asked several times. rune occupies 4 bytes and not just one because it is supposed to store unicode codepoints and not just ASCII characters. Like array indices, the datatype is signed so that you can easily detect overflows or other errors while doing arithmetic with those types.
It doesn’t become negative. There are currently 1,114,112 codepoints in Unicode, which is far from 2,147,483,647 (0x7fffffff
) – even considering all the reserved blocks.
"Golang, Go : what is rune by the way?" mentioned:
With the recent Unicode 6.3, there are over 110,000 symbols defined. This requires at least 21-bit representation of each code point, so a rune is like int32 and has plenty of bits.
But regarding the overflow or negative value issues, note that the implementation of some of the unicode functions like unicode.IsGraphic do include:
We convert to
uint32
to avoid the extra test for negative
Code:
const MaxLatin1 = '\u00FF' // maximum Latin-1 value.
// IsGraphic reports whether the rune is defined as a Graphic by Unicode.
// Such characters include letters, marks, numbers, punctuation, symbols, and
// spaces, from categories L, M, N, P, S, Zs.
func IsGraphic(r rune) bool {
// We convert to uint32 to avoid the extra test for negative,
// and in the index we convert to uint8 to avoid the range check.
if uint32(r) <= MaxLatin1 {
return properties[uint8(r)]&pg != 0
}
return In(r, GraphicRanges...)
}
That may be because a rune is supposed to be constant (as mentioned in "Go rune type explanation", where a rune could be in an int32
or uint32
or even float32
or ...: its constant value authorizes it to be stored in any of those numeric types).
In addition to the above answers given, here are my two cents to why Go needed rune.
this article talks all these in much more details
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With