Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java. Ignore accents when comparing strings

The problem it's easy. Is there any function in JAVA to compare two Strings and return true ignoring the accented chars?

ie

String x = "Joao"; String y = "João"; 

return that are equal.

Thanks

like image 942
framara Avatar asked Mar 03 '10 16:03

framara


People also ask

How do I remove the accent from a String in Java?

Use java. text. Normalizer to handle this for you. This will separate all of the accent marks from the characters.

Can we use != To compare strings in Java?

Note: When comparing two strings in java, we should not use the == or != operators. These operators actually test references, and since multiple String objects can represent the same String, this is liable to give the wrong answer.

When comparing strings Why is == not a good idea?

Using the “==” operator for comparing text values is one of the most common mistakes Java beginners make. This is incorrect because “==” only checks the referential equality of two Strings, meaning if they reference the same object or not.

What is the best way to compare strings in Java?

The right way of comparing String in Java is to either use equals(), equalsIgnoreCase(), or compareTo() method. You should use equals() method to check if two String contains exactly same characters in same order. It returns true if two String are equal or false if unequal.


1 Answers

I think you should be using the Collator class. It allows you to set a strength and locale and it will compare characters appropriately.

From the Java 1.6 API:

You can set a Collator's strength property to determine the level of difference considered significant in comparisons. Four strengths are provided: PRIMARY, SECONDARY, TERTIARY, and IDENTICAL. The exact assignment of strengths to language features is locale dependant. For example, in Czech, "e" and "f" are considered primary differences, while "e" and "ě" are secondary differences, "e" and "E" are tertiary differences and "e" and "e" are identical.

I think the important point here (which people are trying to make) is that "Joao"and "João" should never be considered as equal, but if you are doing sorting you don't want them to be compared based on their ASCII value because then you would have something like Joao, John, João, which is not good. Using the collator class definitely handles this correctly.

like image 128
DaveJohnston Avatar answered Sep 21 '22 10:09

DaveJohnston