Unicode text segmentation requires access to the Grapheme_Cluster_Break property of characters. Which JavaScript famously doesn't provide in a direct way. I was hoping I would be able to use Unicode property escapes in a regexp to work around this, but that doesn't seem to be as simple as /\p{Grapheme_Cluster_Break=Extend}/u
or something like that. You can do \p{Grapheme_Extend}
, but that tests for something different.
Is there a way to trick JavaScript runtimes into giving me information about characters' Grapheme_Cluster_Break value through property escapes? (And if not, why not?)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With