Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get unicode category from rune

Tags:

unicode

go

rune

I'm looking for a way to get the unicode category (RangeTable) from a rune in Go. For example, the character a maps to the Ll category. The unicode package specifies all of the categories (http://golang.org/pkg/unicode/#pkg-variables), but I don't see any way to lookup the category from a given rune. Do I need to manually construct the RangeTable from the rune using the appropriate offsets?

like image 839
Tyler Treat Avatar asked Feb 13 '26 08:02

Tyler Treat


2 Answers

The docs for the "unicode" package does not have a method that returns ranges for the rune but it is not very tricky to build one:

func cat(r rune) (names []string) {
    names = make([]string, 0)
    for name, table := range unicode.Categories {
        if unicode.Is(table, r) {
            names = append(names, name)
        }
    }
    return
}
like image 125
Alex Netkachov Avatar answered Feb 16 '26 05:02

Alex Netkachov


Here is an alternative version based on the accepted answer, that returns the Unicode Category:

// UnicodeCategory returns the Unicode Character Category of the given rune.
func UnicodeCategory(r rune) string {
    for name, table := range unicode.Categories {
        if len(name) == 2 && unicode.Is(table, r) {
            return name
        }
    }
    return "Cn"
}