Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is the same character compared twice by changing its case to UPPER and then to lower?

The below code is in Class String in java. I don't understand why the characters from two different strings are compared twice. at first by doing upper case and if that fails by doing lower case.

My Question here is, is it required? If yes, why?

  public static final Comparator<String> CASE_INSENSITIVE_ORDER
                                             = new CaseInsensitiveComparator();
        private static class CaseInsensitiveComparator
                implements Comparator<String>, java.io.Serializable {
            // use serialVersionUID from JDK 1.2.2 for interoperability
            private static final long serialVersionUID = 8575799808933029326L;

            public int compare(String s1, String s2) {
                int n1 = s1.length();
                int n2 = s2.length();
                int min = Math.min(n1, n2);
                for (int i = 0; i < min; i++) {
                    char c1 = s1.charAt(i);
                    char c2 = s2.charAt(i);
                    if (c1 != c2) {
                        c1 = Character.toUpperCase(c1);
                        c2 = Character.toUpperCase(c2);
                        if (c1 != c2) {
                            c1 = Character.toLowerCase(c1);
                            c2 = Character.toLowerCase(c2);
                            if (c1 != c2) {
                                // No overflow because of numeric promotion
                                return c1 - c2;
                            }
                        }
                    }
                }
                return n1 - n2;
            }
        }
like image 413
Tushar Banne Avatar asked Jan 05 '16 14:01

Tushar Banne


People also ask

Why do we have upper and lowercase letters?

The terms “uppercase" and “lowercase" come from the way in which print shops were organized hundreds of years ago. Individual pieces of metal type were kept in boxes called cases. The smaller letters, which were used most often, were kept in a lower case that was easier to reach.

How do you change capital letters to lowercase?

To use a keyboard shortcut to change between lowercase, UPPERCASE, and Capitalize Each Word, select the text and press SHIFT + F3 until the case you want is applied.

Is upper or lower case easier to read?

Results suggest that upper-case is more legible than the other case styles, especially for visually-impaired readers, because smaller letter sizes can be used than with the other case styles, with no diminution of legibility.


1 Answers

The issue might be more complex.

There are characters, where there are multiple lowercase codepoints for the same uppercase codepoint or vice versa. So to check for case insensitive match, you need to compare both upper and lowercase versions if one of them matches.

One example being

The Greek upper-case letter "Σ" has two different lower-case forms: "ς" in word-final position and "σ" elsewhere.

Source: Wikipedia

For upper case not equal but lowercase very much so, VGR supplied this excellent example:

A better example would be '\u0130' (İ) and 'I'. Passing them through toUpperCase leaves them unchanged (and therefore different), but passing them through toLowerCase results in identical character values

like image 177
Jan Avatar answered Nov 15 '22 05:11

Jan