switch statement matching non-ascii characters

Question

I have accented characters in my source code and have tried replacing them with the unicode equivalent. The program compiles and works properly if I use the actual non-ascii character but I'm concerned this may impact portability. When I try using the unicode equivalent I get warning: case label value exceeds maximum value for type or warning: character constant too long for its type and the case is never matched when I run the program.

for(int i = 0; i < ent->d_namlen; i++)
{
    switch(ent->d_name[i])
    {
        case 'á' : //0x00E1
        ...
    }
 }

ent is struct dirent *ent that gets passed from a calling function.

In place of case 'á' : I've tried case '0x00E1' :, case L 'u00E1 :, case \U000000E9 : and case '\u00E1' : I've tried all without single quotes in which case it won't compile (e.g. says that \u00E1 was not declared in this scope).

ecatmur · Accepted Answer

á is a non-ASCII character and is being represented as multiple bytes in either your source code, the struct dirent, or both.

If you turn on -Wmultichar you will probably get the warning

warning: multi-character character constant

indicating that the character constant 'á' consists of more than one byte, in which case it's probably in UTF-8, but check (e.g. using file). You'll also need to find out the encoding of the dirent entries.

In order to match non-ASCII characters in a string you need to:

make sure that the string and the character are represented in the same encoding, and either
- use a fixed-length encoding (i.e. UCS-4) and a type sufficiently wide to store each codepoint (e.g. int), or
- use a restartable variable-length encoding (i.e. UTF-8) and use substring matching.

Look at http://en.cppreference.com/w/cpp/locale/codecvt_utf8 for an example of how to do the conversions.

switch statement matching non-ascii characters

Tags:

c++

character-encoding

Celeritas

1 Answers

ecatmur

Recent Activity

Donate For Us

switch statement matching non-ascii characters

Tags:

c++

character-encoding

Celeritas

1 Answers

ecatmur

Related questions

Recent Activity

Donate For Us