Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do NSString and NSLog appear to handle %C and %lc (and %S and %ls) differently?

Apple's String Format Specifiers document claims,

The format specifiers supported by the NSString formatting methods and CFString formatting functions follow the IEEE printf specification; … You can also use these format specifiers with the NSLog function.

But, while the printf specification defines %C as an equivalent for %lc and %S as an equivalent for %ls, only %C and %S appear to work correctly with NSLog and +[NSString stringWithFormat:].

For example, consider the following code:

#import <Foundation/Foundation.h>

int main (int argc, const char * argv[]) {
    NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];
    unichar str[3];
    str[0] = 63743;
    str[1] = 33;
    str[2] = (unichar)NULL;

    NSLog(@"NSLog");
    NSLog(@"%%S:  %S", str);
    NSLog(@"%%ls: %ls", str);

    NSLog(@"%%C:  %C", str[0]);
    NSLog(@"%%lc: %lc", str[0]);

    NSLog(@"\n");
    NSLog(@"+[NSString stringWithFormat:]");

    NSLog(@"%%S:  %@", [NSString stringWithFormat:@"%S", str]);
    NSLog(@"%%ls: %@", [NSString stringWithFormat:@"%ls", str]);

    NSLog(@"%%C:  %@", [NSString stringWithFormat:@"%C", str[0]]);
    NSLog(@"%%lc: %@", [NSString stringWithFormat:@"%lc", str[0]]);

    [pool drain];
    return 0;
}

Given the printf specification, I would expect each of the above pairs to print the same thing. But, when I run the code, I get the following output:

2009-03-20 17:00:13.363 UnicharFormatSpecifierTest[48127:10b] NSLog
2009-03-20 17:00:13.365 UnicharFormatSpecifierTest[48127:10b] %S:  !
2009-03-20 17:00:13.366 UnicharFormatSpecifierTest[48127:10b] %ls: ˇ¯!
2009-03-20 17:00:13.366 UnicharFormatSpecifierTest[48127:10b] %C:  
2009-03-20 17:00:13.367 UnicharFormatSpecifierTest[48127:10b] %lc: 
2009-03-20 17:00:13.367 UnicharFormatSpecifierTest[48127:10b] 
2009-03-20 17:00:13.368 UnicharFormatSpecifierTest[48127:10b] +[NSString stringWithFormat:]
2009-03-20 17:00:13.368 UnicharFormatSpecifierTest[48127:10b] %S:  !
2009-03-20 17:00:13.369 UnicharFormatSpecifierTest[48127:10b] %ls: ˇ¯!
2009-03-20 17:00:13.369 UnicharFormatSpecifierTest[48127:10b] %C:  
2009-03-20 17:00:13.370 UnicharFormatSpecifierTest[48127:10b] %lc: 

Am I doing something wrong, or is this a bug in Apple's code?

like image 933
Evan DiBiase Avatar asked Mar 20 '09 21:03

Evan DiBiase


1 Answers

On Mac OS X, <machine/_types.h> defines wchar_t as int, so it's four bytes (32 bits) on all currently-supported architectures.

As you note, the printf(3) manpage defines %S as equivalent to %ls, which takes a pointer to some wchar_t characters (wchar_t *).

The Cocoa documentation you linked to (and its CF equivalent), however, does define %S separately:

  • %S: Null-terminated array of 16-bit Unicode characters

Emphasis added. Also, the same goes for %C.

So, this is not a bug. CF and Cocoa interpret %S and %C differently from how printf and its cousins interpret them. CF and Cocoa treat the character(s) as UTF-16, whereas printf (presumably) treats them as UTF-32.

The CF/Cocoa interpretation is more useful when working with Core Services, as some APIs (such as the File Manager) will hand you text as an array of UniChars, not a CFString; as long as you null-terminate that array, you can use it with %S to print the string.

like image 128
Peter Hosey Avatar answered Sep 27 '22 16:09

Peter Hosey