I have a NSString containing a unicode character bigger than U+FFFF, like the MUSICAL SYMBOL G CLEF symbol '𝄞'. I can create the NSString and display it. <pre class="prettyprint"><code>NSString *s = @"A\U0001d11eB"; // "A𝄞B" NSLog(@"String = \"%@\"", s); </code></pre> The log is correct and displays the 3 characters. This tells me the NSString is well done and there is no encoding problem. <pre class="prettyprint"><code> String = "A𝄞B" </code></pre> But when I try to loop through all characters using the method <pre class="prettyprint"><code>- (unichar)characterAtIndex:(NSUInteger)index </code></pre> everything goes wrong. The type unichar is 16 bits so I expect to get the wrong character for the musical symbol. But the length of the string is also incorrect! <pre class="prettyprint"><code>NSLog(@"Length = %d", [s length]); for (int i=0; i<[s length]; i++) { NSLog(@" Character %d = %c", i, [s characterAtIndex:i]); } </code></pre> displays <pre class="prettyprint"><code> Length = 4 Character 0 = A Character 1 = 4 Character 2 = . Character 3 = B </code></pre> What methods should I use to correctly parse my NSString and get my 3 unicode characters? Ideally the right method should return a type like wchar_t in place of unichar. Thank you

<pre class="prettyprint"><code>NSString *s = @"A\U0001d11eB"; NSData *data = [s dataUsingEncoding:NSUTF32LittleEndianStringEncoding]; const wchar_t *wcs = [data bytes]; for (int i = 0; i < [data length]/4; i++) { NSLog(@"%#010x", wcs[i]); } </code></pre> Output: <pre class="prettyprint"> 0x00000041 0x0001d11e 0x00000042 </pre> (The code assumes that <code>wchar_t</code> has a size of 4 bytes and little-endian encoding.) <code>length</code> and <code>charAtIndex:</code> do not give the expected result because <code>\U0001d11e</code> is internally stored as UTF-16 "surrogate pair". Another useful method for general Unicode strings is <pre class="prettyprint"><code>[s enumerateSubstringsInRange:NSMakeRange(0, [s length]) options:NSStringEnumerationByComposedCharacterSequences usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) { NSLog(@"%@", substring); }]; </code></pre> Output: <pre class="prettyprint"> A 𝄞 B </pre>

How to handle 32bit unicode characters in a NSString

Tags:

objective-c

unicode

nsstring

I have a NSString containing a unicode character bigger than U+FFFF, like the MUSICAL SYMBOL G CLEF symbol '𝄞'. I can create the NSString and display it.

NSString *s = @"A\U0001d11eB";  // "A𝄞B"
NSLog(@"String = \"%@\"", s);

The log is correct and displays the 3 characters. This tells me the NSString is well done and there is no encoding problem.

    String = "A𝄞B"

But when I try to loop through all characters using the method

- (unichar)characterAtIndex:(NSUInteger)index

everything goes wrong.

The type unichar is 16 bits so I expect to get the wrong character for the musical symbol. But the length of the string is also incorrect!

NSLog(@"Length = %d", [s length]);
for (int i=0; i<[s length]; i++)
{
    NSLog(@"  Character %d = %c", i, [s characterAtIndex:i]);
}

displays

    Length = 4
      Character 0 = A
      Character 1 = 4
      Character 2 = .
      Character 3 = B

What methods should I use to correctly parse my NSString and get my 3 unicode characters? Ideally the right method should return a type like wchar_t in place of unichar.

Thank you

620

asked Dec 12 '13 07:12

PatrickV

1 Answers

NSString *s = @"A\U0001d11eB";
NSData *data = [s dataUsingEncoding:NSUTF32LittleEndianStringEncoding];
const wchar_t *wcs = [data bytes];
for (int i = 0; i < [data length]/4; i++) {
    NSLog(@"%#010x", wcs[i]);
}

Output:

0x00000041
0x0001d11e
0x00000042

(The code assumes that wchar_t has a size of 4 bytes and little-endian encoding.)

length and charAtIndex: do not give the expected result because \U0001d11e is internally stored as UTF-16 "surrogate pair".

Another useful method for general Unicode strings is

[s enumerateSubstringsInRange:NSMakeRange(0, [s length])
              options:NSStringEnumerationByComposedCharacterSequences
           usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
    NSLog(@"%@", substring);
}];

Output:

A
𝄞
B

108

answered Sep 25 '22 23:09

Martin R

Related questions
                            
                                Xamarin binding objective-c library to C# delegates and events
                            
                                Huge memory consumption while parsing JSON and creating NSManagedObjects
                            
                                UITableView reloadData is slow
                            
                                Passing blocks to a AFNetworking method?
                            
                                Check if Directory is a Mount Point in Objective C in OSX
                            
                                iOS7 Implement UIViewControllerTransitioningDelegate with Storyboard
                            
                                iPhone how to access the notification center programmatically
                            
                                constant app memory increase ( IOAccelResource )
                            
                                SpriteKit animateWithTextures not work with texture atlas
                            
                                Creating Grid Layout using UICollectionView
                            
                                How can i use UIDocumentInteractionController with MPMediaPickerController
                            
                                Adding a subview to UIButton
                            
                                Does NSURLRequestReloadIgnoringLocalAndRemoteCacheData work in iOS 7?
                            
                                What is the equivalent of UIImageView animationImages in SpriteKit?
                            
                                Core data insert multiple objects
                            
                                Xcode 5 with iOS 6 SDK: 'UIAccelerometer' is unavailable: not available on OS X
                            
                                Autolayout animate constraint does not animate subviews
                            
                                Keep transparency in screenshot iOS
                            
                                Does NSNumber automatically cast strings?
                            
                                Clipping the current context so that it's masked by a path

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With