Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dynamically create NSString with Unicode emoji

I have the string @"Hi there! \U0001F603", which correctly shows the emoji like Hi there! 😃 if I put it in a UILabel.

But I want to create it dynamically like [NSString stringWithFormat:@"Hi there! \U0001F60%ld", (long)arc4random_uniform(10)], but it doesn't even compile. If I double the backslash, it shows the Unicode value literally like Hi there! \U0001F605.

How can I achieve this?

like image 745
Iulian Onofrei Avatar asked Mar 12 '23 20:03

Iulian Onofrei


2 Answers

A step back, for a second: that number that you have, 1F660316, is a Unicode code point, which, to try to put it as simply as possible, is the index of this emoji in the list of all Unicode items. That's not the same thing as the bytes that the computer actually handles, which are the "encoded value" (technically, the code units.

When you write the literal @"\U0001F603" in your code, the compiler does the encoding for you, writing the necessary bytes.* If you don't have the literal at compile time, you must do the encoding yourself. That is, you must transform the code point into a set of bytes that represent it. For example, in the UTF-16 encoding that NSString uses internally, your code point is represented by the bytes ff fe 3d d8 03 de.

You can't, at run time, modify that literal and end up with the correct bytes, because the compiler has already done its work and gone to bed.

(You can read in depth about this stuff and how it pertains to NSString in an article by Ole Begemann at objc.io.)

Fortunately, one of the available encodings, UTF-32, represents code points directly: the value of the bytes is the same as the code point's. In other words, if you assign your code point number to a 32-bit unsigned integer, you've got proper UTF-32-encoded data.

That leads us to the process you need:

// Encoded start point
uint32_t base_point_UTF32 = 0x1F600;

// Generate random point
uint32_t offset = arc4random_uniform(10);
uint32_t new_point = base_point_UTF32 + offset;

// Read the four bytes into NSString, interpreted as UTF-32LE.
// Intel machines and iOS on ARM are little endian; others byte swap/change 
// encoding as necessary.
NSString * emoji = [[NSString alloc] initWithBytes:&new_point
                                            length:4
                                          encoding:NSUTF32LittleEndianStringEncoding];

(N.B. that this may not work as expected for an arbitrary code point; not all code points are valid.)


*Note, it does the same thing for "normal" strings like @"b", as well.

like image 184
jscs Avatar answered Mar 19 '23 20:03

jscs


\U0001F603 is a literal which is evaluated at compile time. You want a solution which can be executed at runtime.

So you want to have a string with a dynamic unicode character. %C if the format specifier for a unicode character (unichar).

[NSString stringWithFormat:@"Hi there! %C", (unichar)(0x01F600 + arc4random_uniform(10))];

unichar is too small for emoji. Thanks @JoshCaswell for correcting me.


Update: a working answer

@JoshCaswell has the correct answer with -initWithBytes:length:encoding:, but I think I can write a better wrapper.

  1. Create a function to do all the work.
  2. Use network ordering for a standard byte order.
  3. No magic number for the length.

Here is my answer

NSString *MyStringFromUnicodeCharacter(uint32_t character) {
    uint32_t bytes = htonl(character); // Convert the character to a known ordering
    return [[NSString alloc] initWithBytes:&bytes length:sizeof(uint32_t) encoding:NSUTF32StringEncoding];
}

So, in use…

NSString *emoji = MyStringFromUnicodeCharacter(0x01F600 + arc4random_uniform(10));
NSString *message = [NSString stringWithFormat:@"Hi there! %@", emoji];

Update 2

Finally, put in a category to make it real Objective-C.

@interface NSString (MyString)
+ (instancetype)stringWithUnicodeCharacter:(uint32_t)character;
@end
@implementation NSString (MyString)
+ (instancetype)stringWithUnicodeCharacter:(uint32_t)character {
    uint32_t bytes = htonl(character); // Convert the character to a known ordering
    return [[NSString alloc] initWithBytes:&bytes length:sizeof(uint32_t) encoding:NSUTF32StringEncoding];
}
@end

And again, in use…

NSString *emoji = [NSString stringWithUnicodeCharacter:0x01F600 + arc4random_uniform(10)];
NSString *message = [NSString stringWithFormat:@"Hi there! %@", emoji];
like image 37
Jeffery Thomas Avatar answered Mar 19 '23 21:03

Jeffery Thomas