Let's say I have a QString that may consist of any Unicode characters, and I want to iterate through its characters or count them. And by "characters" I mean what the user perceives as such (so roughly equivalent to "glyphs") and not simply QChars (16-bit Unicode characters). Some "actual" characters are built of several QChars (surrogate pairs; base character + combining marks). For some combining characters I might get away with normalizing the string to create composite characters, but that does not always help.
Have I overlooked a built-in function that splits a QString into "actual" characters?
Or if I have to parse it myself, is this the structure (in EBNF) or am I missing something?
character = ((high_surrogate, low_surrogate) | base_character), {combining_mark}
(with base_character
being every QChar that is not a surrogate or combining character)
After more research I found the term for "actual character", grapheme, and with it the Qt class for finding grapheme boundaries: QTextBoundaryFinder.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With