(# ゚Д゚) is a 5-letter-word. But in iOS, [@"(# ゚Д゚)" length] is 7.
Why?
I'm using <UITextInput>
to modify the text in a UITextField
or UITextView
. When I make a UITextRange of 5 character length, it can just cover the (# ゚Д゚) . So, why this (# ゚Д゚) looks like a 5-character-word in UITextField
and UITextView
, but looks like a 7-character-word in NSString???
How can I get the correct length of a string in this case?
Parentheses are a pair of punctuation marks that are most often used to add additional nonessential information or an aside to a sentence. Parentheses resemble two curved vertical lines: ( ). A single one of these punctuation marks is called a parenthesis.
The singular form is parenthesis, but the plural parentheses is the word you're more likely to see. Both words have a wide range of related meanings, and what some people identify as a parenthesis, others call parentheses.
Parenthesis is the use of a phrase, word or sentence that's added into writing as extra information or an afterthought. It's punctuated by brackets, commas or dashes. For example, 'his favourite team - whom he had followed since the age of five - was Rockingham Rovers'.
Braces —sometimes known as curly brackets—are not typically used except in technical and mathematical writing.
1) As many in the comments have already stated, Your string is made of 5 composed character sequences (or character clusters if you prefer). When broken down by unichar
s as NSString
’s length
method does you will get a 7 which is the number of unichar
s it takes to represent your string in memory.
2) Apparently the UITextField
and UITextView
are handling the strings in a unichar savvy way. Good news, so can you. See #3.
3) You can get the number of composed character sequences by using some of the NSString
API which properly deals with composed character sequences. A quick example I baked up, very quickly, is a small NSString
category:
@implementation NSString (ComposedCharacterSequences_helper)
-(NSUInteger)numberOfComposedCharacterSequences{
__block NSUInteger count = 0;
[self enumerateSubstringsInRange:NSMakeRange(0, self.length)
options:NSStringEnumerationByComposedCharacterSequences
usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop){
NSLog(@"%@",substring); // Just for fun
count++;
}];
return count;
}
@end
Again this is quick code; but it should get you started. And if you use it like so:
NSString *string = @"(# ゚Д゚)";
NSLog(@"string length %i", string.length);
NSLog(@"composed character count %i", [string numberOfComposedCharacterSequences]);
You will see that you get the desired result.
For an in-depth explanation of the NSString
API check out the WWDC 2012 Session 215 Video "Text and Linguistic Analysis"
Both ゚
and Д゚
are represented by a character sequence of two Unicode characters (even when they are visually presented as one). -[NSString length]
reports the number of Unicode chars:
The number returned includes the individual characters of composed character sequences, so you cannot use this method to determine if a string will be visible when printed or how long it will appear.
If you want to see the byte representation:
#import <Foundation/Foundation.h>
NSString* describeUnicodeCharacters(NSString* str)
{
NSMutableString* codePoints = [NSMutableString string];
for(NSUInteger i = 0; i < [str length]; ++i){
long ch = (long)[str characterAtIndex:i];
[codePoints appendFormat:@"%0.4lX ", ch];
}
return codePoints;
}
int main(int argc, char *argv[]) {
@autoreleasepool {
NSString *s = @" ゚Д゚";
NSLog(@"%ld unicode chars. bytes: %@",
[s length], describeUnicodeCharacters(s));
}
}
The output is: 4 unicode chars. bytes: 0020 FF9F 0414 FF9F
.
2) and 3): what NJones said.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With