I am trying to count "characters" in go. That is, if a string contains one printable "glyph", or "composed character" (or what someone would ordinarily think of as a character), I want it to count 1. For example, the string "Hello, δΈππΏπη", should count 11, since there are 11 characters, and a human would look at this and say there are 11 glyphs.
utf8.RuneCountInString() works well in most cases, including ascii, accents, asian characters and even emojis. However, as I understand it runes correspond to code points, not characters. When I try to use basic emojis it works, but when I use emojis that have different skin tones, I get the wrong count: https://play.golang.org/p/aFIGsB6MsO
From what I read here and here the following should work, but I still don't seem to be getting the right results (it over-counts):
func CountCharactersInString(str string) int {
var ia norm.Iter
ia.InitString(norm.NFC, str)
nc := 0
for !ia.Done() {
nc = nc + 1
ia.Next()
}
return nc
}
This doesn't work either:
func GraphemeCountInString(str string) int {
re := regexp.MustCompile("\\PM\\pM*|.")
return len(re.FindAllString(str, -1))
}
I am looking for something similar to this in Objective C:
+ (NSInteger)countCharactersInString:(NSString *) string {
// --- Calculate the number of characters enterd by user and update character count label
NSInteger count = 0;
NSUInteger index = 0;
while (index < string.length) {
NSRange range = [string rangeOfComposedCharacterSequenceAtIndex:index];
count++;
index += range.length;
}
return count;
}
I wrote a package that allows you to do this: https://github.com/rivo/uniseg. It breaks strings according to the rules specified in Unicode Standard Annex #29 which is what you are looking for. Here is how you would use it in your case:
package main
import (
"fmt"
"github.com/rivo/uniseg"
)
func main() {
fmt.Println(uniseg.GraphemeClusterCount("Hello, δΈππΏπη"))
}
This will print 11
as you expect.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With