Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is uppercase string always of the same length as the original one?

Tags:

c#

unicode

  • Is the length of an unicode uppercase string always the same as the length of an original string, no matter what culture is used?

  • Is the length of an unicode lowercase string always the same as the length of an original string, no matter what culture is used?

In other words, is the following true in C#?

text.ToUpper(CultureInfo.CurrentCulture).Length == text.Length
text.ToLower(CultureInfo.CurrentCulture).Length == text.Length

Note that I'm not interested about the number of bytes: the question about that is already answered.

like image 579
Arseni Mourzenko Avatar asked Nov 30 '13 14:11

Arseni Mourzenko


People also ask

What returns true if all characters in the string are uppercase?

What is isupper() in Python. In Python, isupper() is a built-in method used for string handling. This method returns True if all characters in the string are uppercase, otherwise, returns “False”.

How do you check if a string is uppercase or not?

Traverse the string character by character from start to end. Check the ASCII value of each character for the following conditions: If the ASCII value lies in the range of [65, 90], then it is an uppercase letter. If the ASCII value lies in the range of [97, 122], then it is a lowercase letter.

What is string uppercase?

The java string toUpperCase() method of String class has converted all characters of the string into an uppercase letter. There is two variant of toUpperCase() method. The key thing that is to be taken into consideration is toUpperCase() method worked same as to UpperCase(Locale.

How do you uppercase an entire string?

The toUpperCase() method converts a string to upper case letters. Note: The toLowerCase() method converts a string to lower case letters.


1 Answers

The answers to the questions are “No” and “Yes”, as far as Unicode Standard is concerned.

For example, when converting to uppercase, “ß” U+00DF LATIN SMALL LETTER SHARP S is mapped to two characters “SS” by Unicode mapping rules. It is possible to map it to the single character “ẞ” U+1E9E LATIN CAPITAL LETTER SHARP S, but that’s not the default (and not common at all). Another example is that “fi” U+FB01 LATIN SMALL LIGATURE FI is mapped to “FI”.

In the opposite direction, there is no default mapping that would change the number of characters. See Character Properties, Case Mappings & Names FAQ, which links to the file SpecialCasing.txt that contains all deviations from simple one-to-one mappings. The only rules there that would make the lowercase string different from an uppercase original are a few optional rules related to Lithuanian practices.

like image 71
Jukka K. Korpela Avatar answered Sep 25 '22 21:09

Jukka K. Korpela