I know that to get a unicode character in C++ I can do:
std::wstring str = L"\u4FF0";
However, what if I want to get all the characters in the range 4FF0 to 5FF0? Is it possible to dynamically build a unicode character? What I have in mind is something like this pseudo-code:
for (int i = 20464; i < 24560; i++ { // From 4FF0 to 5FF0
std::wstring str = L"\u" + hexa(i); // build the unicode character
// do something with str
}
How would I do that in C++?
The wchar_t type held within a wstring is an integer type, so you can use it directly:
for (wchar_t c = 0x4ff0; c <= 0x5ff0; ++c) {
std::wstring str(1, c);
// do something with str
}
Be careful trying to do this with characters above 0xffff, since depending on the platform (e.g. Windows) they will not fit into a wchar_t.
If for example you wanted to see the Emoticon block in a string, you can create surrogate pairs:
std::wstring str;
for (int c = 0x1f600; c <= 0x1f64f; ++c) {
if (c <= 0xffff || sizeof(wchar_t) > 2)
str.append(1, (wchar_t)c);
else {
str.append(1, (wchar_t)(0xd800 | ((c - 0x10000) >> 10)));
str.append(1, (wchar_t)(0xdc00 | ((c - 0x10000) & 0x3ff)));
}
}
You cannot increment over Unicode characters as if it is an array, some characters are build up out of multiple 'char's (UTF-8) and multiple 'WCHAR's (UTF-16) that's because of the diacritics etc. If you're really serious about this stuff you should use an API like UniScribe or ICU.
Some resources to read:
http://en.wikipedia.org/wiki/UTF-16/UCS-2
http://en.wikipedia.org/wiki/Precomposed_character
http://en.wikipedia.org/wiki/Combining_character
http://scripts.sil.org/cms/scripts/page.php?item_id=UnicodeNames#4d2aa980
http://en.wikipedia.org/wiki/Unicode_equivalence
http://msdn.microsoft.com/en-us/library/dd374126.aspx
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With