Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Retrieve Unicode code points > U+FFFF from QChar

I have an application that is supposed to deal with all kinds of characters and at some point display information about them. I use Qt and its inherent Unicode support in QChar, QString etc.

Now I need the code point of a QChar in order to look up some data in http://unicode.org/Public/UNIDATA/UnicodeData.txt, but QChar's unicode() method only returns a ushort (unsigned short), which usually is a number from 0 to 65535 (or 0xFFFF). There are characters with code points > 0xFFFF, so how do I get these? Is there some trick I am missing or is this currently not supported by Qt/QChar?

like image 520
Sebastian Negraszus Avatar asked Aug 07 '11 12:08

Sebastian Negraszus


2 Answers

Each QChar is a UTF-16 value, not a complete Unicode codepoint. Therefore, non-BMP characters consist of two QChar surrogate pairs.

like image 98
Delan Azabani Avatar answered Sep 22 '22 09:09

Delan Azabani


The solution appears to lay in code that is documented but not seen much on the Web. You can get the utf-8 value in decimal form. You then apply to determine if a single QChar is large enough. In this case it is not. Then you need to create two QChar's.

uint32_t cp = 155222; // a 4-byte Japanese character 
QString str;
if(Qchar::requiresSurrogate(cp))
{
    QChar charArray[2];
    charArray[0] = QChar::highSurrogate(cp);
    charArray[1] = QChar::lowSurrogate(cp);
    str =  QString(charArray, 2);
}

The resulting QString will contain the correct information to display your supplemental utf-8 character.

like image 23
A. Penner Avatar answered Sep 21 '22 09:09

A. Penner