Objective c doesn't like my unichars?

Question

Xcode complaints about "multi-character character contant"'s when I try to do the following:

static unichar accent characters[] = { 'ā', 'á', 'ă', 'à' };

How do you make an array of characters, when not all of them are ascii? The following works just fine

static unichar accent[] = { 'a', 'b', 'c' };

Workaround

The closest work around I have found is to convert the special characters into hex, ie this works:

static unichar accent characters[] = { 0x0100, 0x0101, 0x0102 };

Yuji · Accepted Answer

It's not that Objective-C doesn't like it, it's that C doesn't. The constant 'c' is for char which has 1 byte, not unichar which has 2 bytes. (see the note below for a bit more detail.)

There's no perfectly supported way to represent a unichar constant. You can use

char* s="ü";

in a UTF-8-encoded source file to get the unicode C-string, or

NSString* s=@"ü";

in a UTF-8 encoded source file to get an NSString. (This was not possible before 10.5. It's OK for iPhone.)

NSString itself is conceptually encoding-neutral; but if you want, you can get the unicode character by using -characterAtIndex:.

Finally two comments:

If you just want to remove accents from the string, you can just use the method like this, without writing the table yourself:

-(NSString*)stringWithoutAccentsFromString:(NSString*)s
{
    if (!s) return nil;
    NSMutableString *result = [NSMutableString stringWithString:s];
    CFStringFold((CFMutableStringRef)result, kCFCompareDiacriticInsensitive, NULL);
    return result;
}

See the document of CFStringFold.

If you want unicode characters for localization/internationalization, you shouldn't embed the strings in the source code. Instead you should use Localizable.strings and NSLocalizedString. See here.

Note: For arcane historical reasons, 'a' is an int in C, see the discussions here. In C++, it's a char. But it doesn't change the fact that writing more than one byte inside '...' is implementation-defined and not recommended. For example, see ISO C Standard 6.4.4.10. However, it was common in classic Mac OS to write the four-letter code enclosed in single quotes, like 'APPL'. But that's another story...

Another complication is that accented letters are not always represented by 1 byte; it depends on the encoding. In UTF-8, it's not. In ISO-8859-1, it is. And unichar should be in UTF-16. Did you save your source code in UTF-16? I think the default of XCode is UTF-8. GCC might do some encoding conversion depending on the setup, too...

daniel.gindi · Answer

Or you can just do it like this:

static unichar accent characters[] = { L'ā', L'á', L'ă', L'à' };

L is a standard C keyword which says "I'm about to write a UNICODE character or character set".

Works fine for Objective-C too.

Note: The compiler may give you a strange warning about too many characters put inside a unichar, but you can safely ignore that warning. Xcode just doesn't deal with the unicode characters the right way, but the compiler parses them properly and the result is OK.

Matt Comi · Answer

Depending on your circumstances, this may be a tidy way to do it:

NSCharacterSet* accents = 
    [NSCharacterSet characterSetWithCharactersInString:@"āáăà"];

And then, if you want to check if a given unichar is one of those accent characters:

if ([accents characterIsMember:someOtherUnichar])
{
}

NSString also has many methods of its own for handling NSCharacterSet objects.

Objective c doesn't like my unichars?

Tags:

xcode

gcc

objective-c

Workaround

corydoras

3 Answers

Yuji

daniel.gindi

Matt Comi

Recent Activity

Donate For Us

Objective c doesn't like my unichars?

Tags:

xcode

gcc

objective-c

Workaround

corydoras

3 Answers

Yuji

daniel.gindi

Matt Comi

Related questions

Recent Activity

Donate For Us