Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does Stride in unicode.RangeTable work?

Tags:

unicode

go

I'd like some help on understanding the unicode package's RangeTable.

Using this (supposedly helping) function:

func printChars(ranges []unicode.Range16) {
  for _, r := range ranges {

    if r.Hi >= 0x80 { // show only ascii
      break
    }
    fmt.Println("\nLo:", r.Lo, "Hi:", r.Hi, "Stride:", r.Stride)

    for c := r.Lo; c <= r.Hi; c++ {
      fmt.Print(string(c) + " ")
    }
  }
  fmt.Println()
}

For digits, I can do printChars(unicode.Digit.R16), and the sequence of digits make sense to me.

 // Lo: 48 Hi: 57 Stride: 1
 // 0 1 2 3 4 5 6 7 8 9

However, to get punctuation printChars(unicode.Punct.R16) results in

 // Lo: 33 Hi: 35 Stride: 1
 // ! " #
 // Lo: 37 Hi: 42 Stride: 1
 // % & ' ( ) *
 // Lo: 44 Hi: 47 Stride: 1
 //  , - . /
 // Lo: 58 Hi: 59 Stride: 1
 // : ;
 // Lo: 63 Hi: 64 Stride: 1
 // ? @
 // Lo: 91 Hi: 93 Stride: 1
 // [ \ ]
 // Lo: 95 Hi: 123 Stride: 28
 // _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z {

I'm surprised that the lower case letters are included too. Also, what does "Stride" mean? It's 1 for all but the last, but the hi-lo difference varies.

As another example, printChars(unicode.Pe.R16). I thought this should give only the end punctuation:

  • ) right parenthesis (U+0029, Pe)
  • ] right square bracket (U+005D, Pe)
  • } right curly bracket (U+007D, Pe)

But instead my function prints

 // Lo: 41 Hi: 93 Stride: 52
 // ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ]

Presumably I'm completely misunderstanding the way this is supposed to work.

How might I correctly get a list of characters in a given category, for example, Punctuation End (Pe) as above?

like image 923
Kim Avatar asked Sep 14 '25 04:09

Kim


2 Answers

Stride is the step with which you have to iterate over the range. Let's heighten the boundary of 0x80 a bit and make the loop to iterate using Stride:

package main

import (
    "fmt"
    "unicode"
)

func printChars(ranges []unicode.Range16) {
  for _, r := range ranges {

    if r.Hi >= 0x100 {
      break
    }
    fmt.Println("\nLo:", r.Lo, "Hi:", r.Hi, "Stride:", r.Stride)

    for c := r.Lo; c <= r.Hi; c+=r.Stride {
      fmt.Print(string(c) + " ")
    }
  }
  fmt.Println()
}

func main() {
    printChars(unicode.Punct.R16)
}

And here is the output:

% go run main.go

Lo: 33 Hi: 35 Stride: 1
! " # 
Lo: 37 Hi: 42 Stride: 1
% & ' ( ) * 
Lo: 44 Hi: 47 Stride: 1
, - . / 
Lo: 58 Hi: 59 Stride: 1
: ; 
Lo: 63 Hi: 64 Stride: 1
? @ 
Lo: 91 Hi: 93 Stride: 1
[ \ ] 
Lo: 95 Hi: 123 Stride: 28
_ { 
Lo: 125 Hi: 161 Stride: 36
} ¡ 
Lo: 167 Hi: 171 Stride: 4
§ « 
Lo: 182 Hi: 183 Stride: 1
¶ · 
Lo: 187 Hi: 191 Stride: 4
» ¿ 

Looks pretty much correct to me.

like image 135
Vladimir Matveev Avatar answered Sep 15 '25 20:09

Vladimir Matveev


Here is a helper function which makes it easy to iterate over all runes contained in a RangeTable:

func RunesFromRange(tab *unicode.RangeTable) <-chan rune {
    res := make(chan rune)
    go func() {
        for _, r16 := range tab.R16 {
            for c := r16.Lo; c <= r16.Hi; c += r16.Stride {
                res <- rune(c)
            }
        }
        for _, r32 := range tab.R32 {
            for c := r32.Lo; c <= r32.Hi; c += r32.Stride {
                res <- rune(c)
            }
        }
        close(res)
    }()
    return res
}

The function can be used as follows:

for c := range RunesFromRange(unicode.Punct) {
    fmt.Printf("%04x %s\n", c, string(c))
}

Runnable code to play with is on the Go Playground (I like the characters starting with 0x 0df4 in the output).

like image 40
jochen Avatar answered Sep 15 '25 20:09

jochen