Finding "actual" characters (graphemes) in a QString

Question

Let's say I have a QString that may consist of any Unicode characters, and I want to iterate through its characters or count them. And by "characters" I mean what the user perceives as such (so roughly equivalent to "glyphs") and not simply QChars (16-bit Unicode characters). Some "actual" characters are built of several QChars (surrogate pairs; base character + combining marks). For some combining characters I might get away with normalizing the string to create composite characters, but that does not always help.

Have I overlooked a built-in function that splits a QString into "actual" characters?

Or if I have to parse it myself, is this the structure (in EBNF) or am I missing something?

character = ((high_surrogate, low_surrogate) | base_character), {combining_mark}

(with base_character being every QChar that is not a surrogate or combining character)

Sebastian Negraszus · Accepted Answer

After more research I found the term for "actual character", grapheme, and with it the Qt class for finding grapheme boundaries: QTextBoundaryFinder.

Finding "actual" characters (graphemes) in a QString

Tags:

unicode

utf-16

qt

Sebastian Negraszus

1 Answers

Sebastian Negraszus

Recent Activity

Donate For Us

Finding "actual" characters (graphemes) in a QString

Tags:

unicode

utf-16

qt

Sebastian Negraszus

1 Answers

Sebastian Negraszus

Related questions

Recent Activity

Donate For Us