Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Objective c doesn't like my unichars?

Xcode complaints about "multi-character character contant"'s when I try to do the following:

static unichar accent characters[] = { 'ā', 'á', 'ă', 'à' };

How do you make an array of characters, when not all of them are ascii? The following works just fine

static unichar accent[] = { 'a', 'b', 'c' }; 

Workaround

The closest work around I have found is to convert the special characters into hex, ie this works:

static unichar accent characters[] = { 0x0100, 0x0101, 0x0102 };
like image 629
corydoras Avatar asked Jan 28 '10 01:01

corydoras


3 Answers

It's not that Objective-C doesn't like it, it's that C doesn't. The constant 'c' is for char which has 1 byte, not unichar which has 2 bytes. (see the note below for a bit more detail.)

There's no perfectly supported way to represent a unichar constant. You can use

char* s="ü";

in a UTF-8-encoded source file to get the unicode C-string, or

NSString* s=@"ü";

in a UTF-8 encoded source file to get an NSString. (This was not possible before 10.5. It's OK for iPhone.)

NSString itself is conceptually encoding-neutral; but if you want, you can get the unicode character by using -characterAtIndex:.

Finally two comments:

  • If you just want to remove accents from the string, you can just use the method like this, without writing the table yourself:

    -(NSString*)stringWithoutAccentsFromString:(NSString*)s
    {
        if (!s) return nil;
        NSMutableString *result = [NSMutableString stringWithString:s];
        CFStringFold((CFMutableStringRef)result, kCFCompareDiacriticInsensitive, NULL);
        return result;
    }
    

    See the document of CFStringFold.

  • If you want unicode characters for localization/internationalization, you shouldn't embed the strings in the source code. Instead you should use Localizable.strings and NSLocalizedString. See here.

Note: For arcane historical reasons, 'a' is an int in C, see the discussions here. In C++, it's a char. But it doesn't change the fact that writing more than one byte inside '...' is implementation-defined and not recommended. For example, see ISO C Standard 6.4.4.10. However, it was common in classic Mac OS to write the four-letter code enclosed in single quotes, like 'APPL'. But that's another story...

Another complication is that accented letters are not always represented by 1 byte; it depends on the encoding. In UTF-8, it's not. In ISO-8859-1, it is. And unichar should be in UTF-16. Did you save your source code in UTF-16? I think the default of XCode is UTF-8. GCC might do some encoding conversion depending on the setup, too...

like image 193
Yuji Avatar answered Nov 20 '22 01:11

Yuji


Or you can just do it like this:

static unichar accent characters[] = { L'ā', L'á', L'ă', L'à' };

L is a standard C keyword which says "I'm about to write a UNICODE character or character set".

Works fine for Objective-C too.

Note: The compiler may give you a strange warning about too many characters put inside a unichar, but you can safely ignore that warning. Xcode just doesn't deal with the unicode characters the right way, but the compiler parses them properly and the result is OK.

like image 27
daniel.gindi Avatar answered Nov 20 '22 01:11

daniel.gindi


Depending on your circumstances, this may be a tidy way to do it:

NSCharacterSet* accents = 
    [NSCharacterSet characterSetWithCharactersInString:@"āáăà"];

And then, if you want to check if a given unichar is one of those accent characters:

if ([accents characterIsMember:someOtherUnichar])
{
}

NSString also has many methods of its own for handling NSCharacterSet objects.

like image 3
Matt Comi Avatar answered Nov 20 '22 01:11

Matt Comi