I'd like some help on understanding the unicode package's RangeTable.
Using this (supposedly helping) function:
func printChars(ranges []unicode.Range16) {
for _, r := range ranges {
if r.Hi >= 0x80 { // show only ascii
break
}
fmt.Println("\nLo:", r.Lo, "Hi:", r.Hi, "Stride:", r.Stride)
for c := r.Lo; c <= r.Hi; c++ {
fmt.Print(string(c) + " ")
}
}
fmt.Println()
}
For digits, I can do printChars(unicode.Digit.R16)
, and the sequence of digits make sense to me.
// Lo: 48 Hi: 57 Stride: 1
// 0 1 2 3 4 5 6 7 8 9
However, to get punctuation printChars(unicode.Punct.R16)
results in
// Lo: 33 Hi: 35 Stride: 1
// ! " #
// Lo: 37 Hi: 42 Stride: 1
// % & ' ( ) *
// Lo: 44 Hi: 47 Stride: 1
// , - . /
// Lo: 58 Hi: 59 Stride: 1
// : ;
// Lo: 63 Hi: 64 Stride: 1
// ? @
// Lo: 91 Hi: 93 Stride: 1
// [ \ ]
// Lo: 95 Hi: 123 Stride: 28
// _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z {
I'm surprised that the lower case letters are included too. Also, what does "Stride" mean? It's 1 for all but the last, but the hi-lo difference varies.
As another example, printChars(unicode.Pe.R16)
. I thought this should give only the end punctuation:
But instead my function prints
// Lo: 41 Hi: 93 Stride: 52
// ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ]
Presumably I'm completely misunderstanding the way this is supposed to work.
How might I correctly get a list of characters in a given category, for example, Punctuation End (Pe) as above?
Stride is the step with which you have to iterate over the range. Let's heighten the boundary of 0x80
a bit and make the loop to iterate using Stride
:
package main
import (
"fmt"
"unicode"
)
func printChars(ranges []unicode.Range16) {
for _, r := range ranges {
if r.Hi >= 0x100 {
break
}
fmt.Println("\nLo:", r.Lo, "Hi:", r.Hi, "Stride:", r.Stride)
for c := r.Lo; c <= r.Hi; c+=r.Stride {
fmt.Print(string(c) + " ")
}
}
fmt.Println()
}
func main() {
printChars(unicode.Punct.R16)
}
And here is the output:
% go run main.go
Lo: 33 Hi: 35 Stride: 1
! " #
Lo: 37 Hi: 42 Stride: 1
% & ' ( ) *
Lo: 44 Hi: 47 Stride: 1
, - . /
Lo: 58 Hi: 59 Stride: 1
: ;
Lo: 63 Hi: 64 Stride: 1
? @
Lo: 91 Hi: 93 Stride: 1
[ \ ]
Lo: 95 Hi: 123 Stride: 28
_ {
Lo: 125 Hi: 161 Stride: 36
} ¡
Lo: 167 Hi: 171 Stride: 4
§ «
Lo: 182 Hi: 183 Stride: 1
¶ ·
Lo: 187 Hi: 191 Stride: 4
» ¿
Looks pretty much correct to me.
Here is a helper function which makes it easy to iterate over all runes contained in a RangeTable:
func RunesFromRange(tab *unicode.RangeTable) <-chan rune {
res := make(chan rune)
go func() {
for _, r16 := range tab.R16 {
for c := r16.Lo; c <= r16.Hi; c += r16.Stride {
res <- rune(c)
}
}
for _, r32 := range tab.R32 {
for c := r32.Lo; c <= r32.Hi; c += r32.Stride {
res <- rune(c)
}
}
close(res)
}()
return res
}
The function can be used as follows:
for c := range RunesFromRange(unicode.Punct) {
fmt.Printf("%04x %s\n", c, string(c))
}
Runnable code to play with is on the Go Playground (I like the characters starting with 0x 0df4 in the output).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With