Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

(# ゚Д゚) is a 5-letter-word. But in iOS, [@"(# ゚Д゚)" length] is 7. Why?

(# ゚Д゚) is a 5-letter-word. But in iOS, [@"(# ゚Д゚)" length] is 7.

  1. Why?

  2. I'm using <UITextInput> to modify the text in a UITextField or UITextView. When I make a UITextRange of 5 character length, it can just cover the (# ゚Д゚) . So, why this (# ゚Д゚) looks like a 5-character-word in UITextField and UITextView, but looks like a 7-character-word in NSString???

  3. How can I get the correct length of a string in this case?

like image 968
YuAo Avatar asked Feb 18 '13 03:02

YuAo


People also ask

What are () these called?

Parentheses are a pair of punctuation marks that are most often used to add additional nonessential information or an aside to a sentence. Parentheses resemble two curved vertical lines: ( ). A single one of these punctuation marks is called a parenthesis.

What is a single parentheses called?

The singular form is parenthesis, but the plural parentheses is the word you're more likely to see. Both words have a wide range of related meanings, and what some people identify as a parenthesis, others call parentheses.

What is parenthesis example?

Parenthesis is the use of a phrase, word or sentence that's added into writing as extra information or an afterthought. It's punctuated by brackets, commas or dashes. For example, 'his favourite team - whom he had followed since the age of five - was Rockingham Rovers'.

What are squiggly brackets called?

Braces —sometimes known as curly brackets—are not typically used except in technical and mathematical writing.


2 Answers

1) As many in the comments have already stated, Your string is made of 5 composed character sequences (or character clusters if you prefer). When broken down by unichars as NSString’s length method does you will get a 7 which is the number of unichars it takes to represent your string in memory.

2) Apparently the UITextField and UITextView are handling the strings in a unichar savvy way. Good news, so can you. See #3.

3) You can get the number of composed character sequences by using some of the NSString API which properly deals with composed character sequences. A quick example I baked up, very quickly, is a small NSString category:

@implementation NSString (ComposedCharacterSequences_helper)
-(NSUInteger)numberOfComposedCharacterSequences{
    __block NSUInteger count = 0;
    [self enumerateSubstringsInRange:NSMakeRange(0, self.length)
                             options:NSStringEnumerationByComposedCharacterSequences
                          usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop){
                              NSLog(@"%@",substring); // Just for fun
                              count++;
                          }];
    return count;
}
@end

Again this is quick code; but it should get you started. And if you use it like so:

NSString *string = @"(# ゚Д゚)";
NSLog(@"string length %i", string.length);
NSLog(@"composed character count %i", [string numberOfComposedCharacterSequences]);

You will see that you get the desired result.

For an in-depth explanation of the NSString API check out the WWDC 2012 Session 215 Video "Text and Linguistic Analysis"

like image 143
NJones Avatar answered Oct 20 '22 14:10

NJones


Both and Д゚ are represented by a character sequence of two Unicode characters (even when they are visually presented as one). -[NSString length] reports the number of Unicode chars:

The number returned includes the individual characters of composed character sequences, so you cannot use this method to determine if a string will be visible when printed or how long it will appear.

If you want to see the byte representation:

#import <Foundation/Foundation.h>

NSString* describeUnicodeCharacters(NSString* str)
{
    NSMutableString* codePoints = [NSMutableString string];
    for(NSUInteger i = 0; i < [str length]; ++i){
        long ch = (long)[str characterAtIndex:i];
        [codePoints appendFormat:@"%0.4lX ", ch];
    }
    return codePoints;
}


int main(int argc, char *argv[]) {
    @autoreleasepool {
        NSString *s = @" ゚Д゚";
        NSLog(@"%ld unicode chars. bytes: %@", 
            [s length], describeUnicodeCharacters(s));
    }
}

The output is: 4 unicode chars. bytes: 0020 FF9F 0414 FF9F.

2) and 3): what NJones said.

like image 27
Jano Avatar answered Oct 20 '22 16:10

Jano