I have the following code, it is supposed to cast a rune
into a string
and print it. However, I am getting undefined characters when it is printed. I am unable to figure out where the bug is:
package main import ( "fmt" "strconv" "strings" "text/scanner" ) func main() { var b scanner.Scanner const a = `a` b.Init(strings.NewReader(a)) c := b.Scan() fmt.Println(strconv.QuoteRune(c)) }
It represents a Rune constant, where an integer value recognizes a Unicode code point. In Go language, a Rune Literal is expressed as one or more characters enclosed in single quotes like 'g', '\t', etc. In between single quotes, you are allowed to place any character except a newline and an unescaped single quote.
When you convert a string to a rune slice, you get a new slice that contains the Unicode code points (runes) of the string. For an invalid UTF-8 sequence, the rune value will be 0xFFFD for each invalid byte.
Code points, characters, and runes The Unicode standard uses the term “code point” to refer to the item represented by a single value. The code point U+2318, with hexadecimal value 2318, represents the symbol ⌘.
That's because you used Scanner.Scan()
to read a rune
but it does something else. Scanner.Scan()
can be used to read tokens or rune
s of special tokens controlled by the Scanner.Mode
bitmask, and it returns special constants form the text/scanner
package, not the read rune itself.
To read a single rune
use Scanner.Next()
instead:
c := b.Next() fmt.Println(c, string(c), strconv.QuoteRune(c))
Output:
97 a 'a'
If you just want to convert a single rune
to string
, use a simple type conversion. rune
is alias for int32
, and converting integer numbers to string
:
Converting a signed or unsigned integer value to a string type yields a string containing the UTF-8 representation of the integer.
So:
r := rune('a') fmt.Println(r, string(r))
Outputs:
97 a
Also to loop over the runes of a string
value, you can simply use the for ... range
construct:
for i, r := range "abc" { fmt.Printf("%d - %c (%v)\n", i, r, r) }
Output:
0 - a (97) 1 - b (98) 2 - c (99)
Or you can simply convert a string
value to []rune
:
fmt.Println([]rune("abc")) // Output: [97 98 99]
There is also utf8.DecodeRuneInString()
.
Try the examples on the Go Playground.
Note:
Your original code (using Scanner.Scan()
) works like this:
Scanner.Init()
which sets the Mode (b.Mode
) to scanner.GoTokens
.Calling Scanner.Scan()
on the input (from "a"
) returns scanner.Ident
because "a"
is a valid Go identifier:
c := b.Scan() if c == scanner.Ident { fmt.Println("Identifier:", b.TokenText()) } // Output: "Identifier: a"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With