Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Length of Utf-32 character in Qt

I'm using Qt5. I have a QString holding one character U"\x1D4CC" (𝓌) that is longer than 16 bits. Even though this is only one character, Qt returns that size of this string is 2. Is there any way to display how many real characters a QString has making the assumption that there can be 32-characters?

like image 316
user1781713 Avatar asked Apr 14 '16 12:04

user1781713


People also ask

Is UTF-32 variable-length?

UTF-32 is a fixed-length encoding, in contrast to all other Unicode transformation formats, which are variable-length encodings. Each 32-bit value in UTF-32 represents one Unicode code point and is exactly equal to that code point's numerical value.

How many bytes is UTF-32?

Both UTF-8 and UTF-16 are variable-length encoding, where the number of bytes used depends upon Unicode code points. On the other hand, UTF-32 is fixed-width encoding, where each code point takes 4 bytes.

What is UTF-32 encoding?

UTF-32 is an encoding of Unicode in which each character is composed of 4 bytes. The IBM® i operating system does not support UTF-32 encoding with a CCSID value. Unicode was originally designed as a pure 16-bit encoding, aimed at representing all modern scripts.

What UTF-8 means?

UTF-8 (UCS Transformation Format 8) is the World Wide Web's most common character encoding. Each character is represented by one to four bytes. UTF-8 is backward-compatible with ASCII and can represent any standard Unicode character.


1 Answers

Unicode characters with code values above 65535 are stored using surrogate pairs, i.e., two consecutive QChars. QString::length return the number of QChar in this string, which may differ from number of graphemes(real characters).

To calculate number of graphemes, you can use QTextBoundaryFinder class.

QString str = "𝓌";
QTextBoundaryFinder finder(QTextBoundaryFinder::Grapheme, str);
int count = 0;
while (finder.toNextBoundary() != -1)
    ++count;
qDebug() << count;

Or you can convert your string to UCS-4/UTF-32 representation and calculate number of 32-bit characters.

QVector<uint> ucs4 = str.toUcs4();
qDebug() << ucs4.size();
like image 166
Meefte Avatar answered Oct 11 '22 15:10

Meefte