I've got an international character stored in a unichar variable. This character does not come from a file or url. The variable itself only stores an unsigned short(0xce91) which is in UTF-8 format and translates to the greek capital letter 'A'. I'm trying to put that character into an NSString variable but i fail miserably.
I've tried 2 different ways both of which unsuccessful:
unichar greekAlpha = 0xce91; //could have written greekAlpha = 'Α' instead.
NSString *theString = [NSString stringWithFormat:@"Greek Alpha: %C", greekAlpha];
No good. I get some weird chinese characters. As a sidenote this works perfectly with english characters.
Then I also tried this:
NSString *byteString = [[NSString alloc] initWithBytes:&greekAlpha
length:sizeof(unichar)
encoding:NSUTF8StringEncoding];
But this doesn't work either. I'm obviously doing something terribly wrong, but I don't know what. Can someone help me please ? Thanks!
(NSString *) is simply the type of the argument - a string object, which is the NSString class in Cocoa. In Objective-C you're always dealing with object references (pointers), so the "*" indicates that the argument is a reference to an NSString object.
An NSString object can be initialized from or written to a C buffer, an NSData object, or the contents of an NSURL . It can also be encoded and decoded to and from ASCII, UTF–8, UTF–16, UTF–32, or any other string encoding represented by NSStringEncoding .
unichar greekAlpha = 0x0391;
NSString* s = [NSString stringWithCharacters:&greekAlpha length:1];
And now you can incorporate that NSString into another in any way you like. Do note, however, that it is now legal to type a Greek alpha directly into an NSString literal.
Since 0xce91
is in the UTF-8 format and %C
expects it to be in UTF-16 a simple solution like the one above won't work. For stringWithFormat:@"%C"
to work you need to input 0x391
which is the UTF-16 unicode.
In order to create a string from the UTF-8 encoded unichar you need to first split the unicode into it's octets and then use initWithBytes:length:encoding
.
unichar utf8char = 0xce91;
char chars[2];
int len = 1;
if (utf8char > 127) {
chars[0] = (utf8char >> 8) & (1 << 8) - 1;
chars[1] = utf8char & (1 << 8) - 1;
len = 2;
} else {
chars[0] = utf8char;
}
NSString *string = [[NSString alloc] initWithBytes:chars
length:len
encoding:NSUTF8StringEncoding];
The above answer is great but doesn't account for UTF-8 characters longer than 16 bits, e.g. the ellipsis symbol - 0xE2,0x80,0xA6. Here's a tweak to the code:
if (utf8char > 65535) {
chars[0] = (utf8char >> 16) & 255;
chars[1] = (utf8char >> 8) & 255;
chars[2] = utf8char & 255;
chars[3] = 0x00;
} else if (utf8char > 127) {
chars[0] = (utf8char >> 8) & 255;
chars[1] = utf8char & 255;
chars[2] = 0x00;
} else {
chars[0] = utf8char;
chars[1] = 0x00;
}
NSString *string = [[[NSString alloc] initWithUTF8String:chars] autorelease];
Note the different string initialisation method which doesn't require a length parameter.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With