Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NSLog incorrect encoding

Tags:

objective-c

I've got a problem with the following code:

NSString *strValue=@"你好";
char temp[200];
strcpy(temp, [strValue UTF8String]);
printf("%s", temp);
NSLog(@"%s", temp);

in the first line of the codes, two Chinese characters are double quoted. The problem is printf function can display the Chinese characters properly, but NSLog can't.


Thanks to all. I figured out a solution for this problem. Foundation uses UTF-16 by default, so in order to use NSLog to output the c string in the example, I have to use cStringUsingEncoding to get UTF-16 c string and use %S to replace %s.
NSString *strValue=@"你好";
char temp[200];
strcpy(temp, [strValue UTF8String]);
printf("%s", temp);
strcpy(temp, [strValue cStringUsingEncoding:NSUTF16LittleEndianStringEncoding]);
NSLog(@"%S", temp);
like image 500
Jiang Avatar asked Apr 06 '09 01:04

Jiang


2 Answers

My guess is that NSLog assumes a different encoding for 8-bit C-strings than UTF-8, and it may be one that doesn't support Chinese characters. Awkward as it is, you might try this:

NSLog(@"%@", [NSString stringWithCString: temp encoding: NSUTF8StringEncoding]);
like image 34
Don McCaughey Avatar answered Sep 17 '22 12:09

Don McCaughey


NSLog's %s format specifier is in the system encoding, which seems to always be MacRoman and not unicode, so it can only display characters in MacRoman encoding. Your best option with NSLog is just to use the native object format specifier %@ and pass the NSString directly instead of converting it to a C String. If you only have a C string and you want to use NSLog to display a message instead of printf or asl, you will have to do something like Don suggests in order to convert the string to an NSString object first.

So, all of these should display the expected string:

NSString *str = @"你好";
const char *cstr = [str UTF8String];
NSLog(@"%@", str);
printf("%s\n", cstr);
NSLog(@"%@", [NSString stringWithUTF8String:cstr]);

If you do decide to use asl, note that while it accepts strings in UTF8 format and passes the correct encoding to the syslog daemon (so it will show up properly in the console), it encodes the string for visual encoding when displaying to the terminal or logging to a file handle, so non-ASCII values will be displayed as escaped character sequences.

like image 158
Jason Coco Avatar answered Sep 19 '22 12:09

Jason Coco