Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NSString isEqualToString: doesn't work

i use this code in my app. just found is not correct when compare korean

        for (NSString *lang in array){
        NSString *currentLang = [[MLLanguage sharedInstance] lang];
        BOOL flag = [lang isEqualToString:currentLang];
        NSLog(@"\n'%@' isEqual to '%@', %d\n%@\n%@", lang, currentLang, flag?1:0, [lang dataUsingEncoding:NSUTF8StringEncoding], [currentLang dataUsingEncoding:NSUTF8StringEncoding]);

wrong result: the two korean word compared as different

        2012-06-19 21:16:52.681 Motilink[10188:11903] -[MLSettingLanguageViewController             loadDownloadedData][Line 50] 
        'English' isEqual to '한국어', 0
        <456e676c 697368>
        <ed959cea b5adec96 b4>
        2012-06-19 21:16:52.682 Motilink[10188:11903] -[MLSettingLanguageViewController             loadDownloadedData][Line 50] 
        '한국어' isEqual to '한국어', 0
        <e18492e1 85a1e186 abe18480 e185aee1 86a8e184 8be185a5>
        <ed959cea b5adec96 b4>
        2012-06-19 21:16:52.682 Motilink[10188:11903] -[MLSettingLanguageViewController             loadDownloadedData][Line 50] 
        '中国语' isEqual to '한국어', 0
        <e4b8ade5 9bbde8af ad>
        <ed959cea b5adec96 b4>

correct one:

        2012-06-19 21:35:00.908 Motilink[10188:11903] -[MLSettingLanguageViewController loadDownloadedData][Line 50] 
        'English' isEqual to '中国语', 0
        <456e676c 697368>
        <e4b8ade5 9bbde8af ad>
        2012-06-19 21:35:00.909 Motilink[10188:11903] -[MLSettingLanguageViewController             loadDownloadedData][Line 50] 
        '한국어' isEqual to '中国语', 0
        <e18492e1 85a1e186 abe18480 e185aee1 86a8e184 8be185a5>
        <e4b8ade5 9bbde8af ad>
        2012-06-19 21:35:00.909 Motilink[10188:11903] -[MLSettingLanguageViewController loadDownloadedData][Line 50] 
        '中国语' isEqual to '中国语', 1
        <e4b8ade5 9bbde8af ad>
        <e4b8ade5 9bbde8af ad>

it seems that: NSString use encode by itself,

english only use 7 byte like ascii

chinese use use 9 byte maybe utf8

but in korean, it appear two different result,

does anyone know this

like image 544
Galen Zhao Avatar asked Jun 19 '12 12:06

Galen Zhao


1 Answers

The problem here is that you compare non-normalized strings. In Unicode, you can either use characters directly, or compose them from other characters. For example in German, there is the character "ä" which can be either represented by the codepoint "ä" or the sequence of code points for "¨" and "a".

You have the same problem here with the Korean strings: While they look the same in output, one of them is decomposed (which leads to the longer UTF-8 data representation) while the other isn't.

One way to work around this problem is to normalize all your strings using - [NSString precomposedStringWithCanonicalMapping]:

BOOL flag = [[lang precomposedStringWithCanonicalMapping] isEqualToString:
                    [currentLang precomposedStringWithCanonicalMapping]];
like image 119
Tammo Freese Avatar answered Oct 18 '22 11:10

Tammo Freese