Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

switch statement matching non-ascii characters

I have accented characters in my source code and have tried replacing them with the unicode equivalent. The program compiles and works properly if I use the actual non-ascii character but I'm concerned this may impact portability. When I try using the unicode equivalent I get warning: case label value exceeds maximum value for type or warning: character constant too long for its type and the case is never matched when I run the program.

for(int i = 0; i < ent->d_namlen; i++)
{
    switch(ent->d_name[i])
    {
        case 'á' : //0x00E1
        ...
    }
 }

ent is struct dirent *ent that gets passed from a calling function.

In place of case 'á' : I've tried case '0x00E1' :, case L 'u00E1 :, case \U000000E9 : and case '\u00E1' : I've tried all without single quotes in which case it won't compile (e.g. says that \u00E1 was not declared in this scope).

like image 302
Celeritas Avatar asked Aug 21 '12 23:08

Celeritas


1 Answers

á is a non-ASCII character and is being represented as multiple bytes in either your source code, the struct dirent, or both.

If you turn on -Wmultichar you will probably get the warning

warning: multi-character character constant

indicating that the character constant 'á' consists of more than one byte, in which case it's probably in UTF-8, but check (e.g. using file). You'll also need to find out the encoding of the dirent entries.

In order to match non-ASCII characters in a string you need to:

  • make sure that the string and the character are represented in the same encoding, and either
    • use a fixed-length encoding (i.e. UCS-4) and a type sufficiently wide to store each codepoint (e.g. int), or
    • use a restartable variable-length encoding (i.e. UTF-8) and use substring matching.

Look at http://en.cppreference.com/w/cpp/locale/codecvt_utf8 for an example of how to do the conversions.

like image 185
ecatmur Avatar answered Nov 15 '22 06:11

ecatmur