Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting UTF-8 Characters to Upper/Lower case C++

I have a string that contains UTF-8 Characters, and I have a method that is supposed to convert every character to either upper or lower case, this is easily done with characters that overlap with ASCII, and obviously some characters cannot be converted, e.g. any Chinese character. However is there a good way to detect and convert other characters that can be Upper/Lower, e.g. all the greek characters? Also please note that I need to be able to do this on both Windows and Linux.

Thank you,

like image 548
NSA Avatar asked Sep 08 '10 23:09

NSA


2 Answers

Have a look at ICU.

Note that lower case to upper case functions are locale-dependant. Think about the turkish (ascii) letter I which gets "dotless lowercase i" and (ascii) i which gets "uppercase I with a dot".

like image 67
Alexandre C. Avatar answered Sep 24 '22 18:09

Alexandre C.


Assuming that you have access to wctype.h, then convert your text to a 2-byte unicode string and use towupper(). Then convert it back to UTF-8.

like image 30
tidwall Avatar answered Sep 25 '22 18:09

tidwall