How to compare character ignoring case in primitive types

Tags:

I am writing these lines of code:

String name1 = fname.getText().toString(); String name2 = sname.getText().toString(); aru = 0;  count1 = name1.length(); count2 = name2.length(); for (i = 0; i < count1; i++) {       for (j = 0; j < count2; j++)     {          if (name1.charAt(i)==name2.charAt(j))             aru++;     }     if(aru!=0)         aru++; }

I want to compare the Characters of two Strings ignoring the case. Simply using IgnoreCase doesn't work. Adding '65' ASCII value doesn't work either. How do I do this?

561

asked Apr 19 '12 07:04

2 Answers

The Character class of Java API has various functions you can use.

You can convert your char to lowercase at both sides:

Character.toLowerCase(name1.charAt(i)) == Character.toLowerCase(name2.charAt(j))

There are also a methods you can use to verify if the letter is uppercase or lowercase:

Character.isUpperCase('P') Character.isLowerCase('P')

158

answered Sep 23 '22 10:09

You can't actually do the job quite right with toLowerCase, either on a string or in a character. The problem is that there are variant glyphs in either upper or lower case, and depending on whether you uppercase or lowercase your glyphs may or may not be preserved. It's not even clear what you mean when you say that two variants of a lower-case glyph are compared ignoring case: are they or are they not the same? (Note that there are also mixed-case glyphs: \u01c5, \u01c8, \u01cb, \u01f2 or ǅ, ǈ, ǋ, ǲ, but any method suggested here will work on those as long as they should count as the same as their fully upper or full lower case variants.)

There is an additional problem with using Char: there are some 80 code points not representable with a single Char that are upper/lower case variants (40 of each), at least as detected by Java's code point upper/lower casing. You therefore need to get the code points and change the case on these.

But code points don't help with the variant glyphs.

Anyway, here's a complete list of the glyphs that are problematic due to variants, showing how they fare against 6 variant methods:

Character toLowerCase
Character toUpperCase
String toLowerCase
String toUpperCase
String equalsIgnoreCase
Character toLowerCase(toUpperCase) (or vice versa)

For these methods, S means that the variants are treated the same as each other, D means the variants are treated as different from each other.

Behavior     Unicode                             Glyphs ===========  ==================================  ========= 1 2 3 4 5 6  Upper  Lower  Var Up Var Lo Vr Lo2  U L u l l2 - - - - - -  ------ ------ ------ ------ ------  - - - - - D D D D S S  \u0049 \u0069 \u0130 \u0131         I i İ ı    S D S D S S  \u004b \u006b \u212a                K k K      D S D S S S  \u0053 \u0073        \u017f         S s   ſ    D S D S S S  \u039c \u03bc        \u00b5         Μ μ   µ    S D S D S S  \u00c5 \u00e5 \u212b                Å å Å      D S D S S S  \u0399 \u03b9        \u0345 \u1fbe  Ι ι   ͅ ι  D S D S S S  \u0392 \u03b2        \u03d0         Β β   ϐ    D S D S S S  \u0395 \u03b5        \u03f5         Ε ε   ϵ    D D D D S S  \u0398 \u03b8 \u03f4 \u03d1         Θ θ ϴ ϑ    D S D S S S  \u039a \u03ba        \u03f0         Κ κ   ϰ    D S D S S S  \u03a0 \u03c0        \u03d6         Π π   ϖ    D S D S S S  \u03a1 \u03c1        \u03f1         Ρ ρ   ϱ    D S D S S S  \u03a3 \u03c3        \u03c2         Σ σ   ς    D S D S S S  \u03a6 \u03c6        \u03d5         Φ φ   ϕ    S D S D S S  \u03a9 \u03c9 \u2126                Ω ω Ω      D S D S S S  \u1e60 \u1e61        \u1e9b         Ṡ ṡ   ẛ

Complicating this still further is that there is no way to get the Turkish I's right (i.e. the dotted versions are different than the undotted versions) unless you know you're in Turkish; none of these methods give correct behavior and cannot unless you know the locale (i.e. non-Turkish: i and I are the same ignoring case; Turkish, not).

Overall, using toUpperCase gives you the closest approximation, since you have only five uppercase variants (or four, not counting Turkish).

You can also try to specifically intercept those five troublesome cases and call toUpperCase(toLowerCase(c)) on them alone. If you choose your guards carefully (just toUpperCase if c < 0x130 || c > 0x212B, then work through the other alternatives) you can get only a ~20% speed penalty for characters in the low range (as compared to ~4x if you convert single characters to strings and equalsIgnoreCase them) and only about a 2x penalty if you have a lot in the danger zone. You still have the locale problem with dotted I, but otherwise you're in decent shape. Of course if you can use equalsIgnoreCase on a larger string, you're better off doing that.

Here is sample Scala code that does the job:

def elevateCase(c: Char): Char = {   if (c < 0x130 || c > 0x212B) Character.toUpperCase(c)   else if (c == 0x130 || c == 0x3F4 || c == 0x2126 || c >= 0x212A)     Character.toUpperCase(Character.toLowerCase(c))   else Character.toUpperCase(c) }

answered Sep 21 '22 10:09

Rex Kerr

Related questions
                            
                                Jackson dynamic property names
                            
                                how to run a command at terminal from java program?
                            
                                Gson to json conversion with two DateFormat
                            
                                HashMap - contains and get methods should not be used together
                            
                                Spring core. Default @Bean destroy method
                            
                                How to find out which object currently has focus
                            
                                Deleting objects from an ArrayList in Java
                            
                                Redirect System.out.println
                            
                                How to find the index of an element in a TreeSet?
                            
                                Java SimpleDateFormat pattern for W3C XML dates with timezone [duplicate]
                            
                                ArrayOutOfBoundsException on Bean creation while using Java 8 constructs
                            
                                Error:Cause: failed to find target with hash string 'Google Inc.:Google APIs:23' in: E:\AndroidStudio\SDK
                            
                                How can I decode JWT token in android?
                            
                                Javadoc template generator [closed]
                            
                                How do you keep the machine awake?
                            
                                Including images in javadocs
                            
                                Using java class HttpsURLConnection
                            
                                how to find the jar file containing a class definition? [closed]
                            
                                Invalid Thread Access Error with Java SWT
                            
                                Unknown error: Unable to build: the file dx.jar was not loaded from the SDK folder

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to compare character ignoring case in primitive types

Tags:

java

string

case-insensitive

case-sensitive

character

Arush Kamboj

People also ask

2 Answers

Shehzad

Rex Kerr

Recent Activity

Donate For Us