NSString from Unicode

Question

I have a bunch of unicode characters wrapped up into NSNumber like so :

@(0x1f4de),    // EntypoIconTypePhone
@(0x1f4f1),    // EntypoIconTypeMobile
@(0xe789),     // EntypoIconTypeMouse
@(0xe723),     // EntypoIconTypeAddress
@(0x2709),     // EntypoIconTypeMail
@(0x1f53f),    // EntypoIconTypePaperPlane
@(0x270e),     // EntypoIconTypePencil

These are Icons from the Entypo font (highly recommended).

This is the code I am using to create NSString from the unicode :

NSNumber *u = self.unicodeLookup[type];

int unicode = [u intValue];
UniChar chars[] = {unicode};

NSString *string = [[NSString alloc] initWithCharacters:chars length:sizeof(chars) / sizeof(UniChar)];

What I am finding is that some of these icons are being created as expected but not all of them; and from what I can see it is the unicodes which have 5 digits in them that are not being created properly.

For example, these work :

@(0xe723),     // EntypoIconTypeAddress
@(0x2709),     // EntypoIconTypeMail

but these don't :

@(0x1f4de),    // EntypoIconTypePhone
@(0x1f4f1),    // EntypoIconTypeMobile

I'm pretty sure this is my conversion code. I don't really understand all this encoding malarky.

trojanfoe · Accepted Answer

If you store your character constants using unichar, rather than NSNumber objects, then the compiler itself will tell you the reason:

unichar chars[] = 
{
    0xe723,     // EntypoIconTypeAddress
    0x2709,     // EntypoIconTypeMail
    0x1f4de,    // EntypoIconTypePhone
    0x1f4f1     // EntypoIconTypeMobile
};

Implicit conversion from 'int' to 'unichar' (aka 'unsigned short') changes value from 128222 to 62686
Implicit conversion from 'int' to 'unichar' (aka 'unsigned short') changes value from 128241 to 62705

As iOS/OSX uses 16-bit representation of unicode characters internally, and 0x1f4de and 0x1f4f1 are both 32-bits, you are going to need to encode those characters as surrogate pairs:

a = 0x1f4de - 0x10000 = 0xf4de
high = a >> 10 = 0x3d
low = a & 0x3ff = 0xde
w1 = high + 0xd800 = 0xd83d
w2 = low + 0xdc00 = 0xdcde

0x1f4de (UTF-32) = 0xd83d 0xdcde (UTF-16)

(See this Wikipedia page).

The upshot is that you cannot use a single array of unicode characters as you are going to have to know the length of each character's encoding.

NSString from Unicode

Tags:

objective-c

unicode

nsstring

Lee Probert

1 Answers

trojanfoe

Recent Activity

Donate For Us

NSString from Unicode

Tags:

objective-c

unicode

nsstring

Lee Probert

1 Answers

trojanfoe

Related questions

Recent Activity

Donate For Us