Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Check how much a String sounds like another one in Java

Tags:

java

string

I'd like to know if there is any class in Java able to check, using its own criteria, how much a String is equal to another one. Example :

  • William Shakespeare / William Shakespeare : might be 100%
  • William Shakespe**a**re / William Shakespe**e**re : might have above 90%
  • William Shakespeare / Shakespeare, William : might have above 70% (just examples)
like image 247
Llistes Sugra Avatar asked Mar 17 '10 09:03

Llistes Sugra


People also ask

How do you compare strings with equal?

The equals() method compares two strings, and returns true if the strings are equal, and false if not. Tip: Use the compareTo() method to compare two strings lexicographically.

How do you match a string in Java?

Using String. equals() :In Java, string equals() method compares the two given strings based on the data/content of the string. If all the contents of both the strings are same then it returns true. If any character does not match, then it returns false.

How do you find the number of characters in a string in Java?

Java has an inbuilt method called length() to find the number of characters of any String. int length(); where length() is a method to find the number of characters and returns the result as an integer.


2 Answers

I see two main candidates:

  • The Soundex encoding, implemented by Apache Commons. However, note that it's mainly meant for single, relatively short words. It won't find a similarity in your third example. Additionally, it really only works for English words.
  • The Levenshtein distance (Again implemented at Apache Commons). This is language agnostic, but similarity for switched parts as in your third example will be relatively low (more like 40%). Modifications like the Damerau–Levenshtein distance may yield better results.
like image 93
Michael Borgwardt Avatar answered Sep 21 '22 19:09

Michael Borgwardt


Generally, there is the levenshtein algorithm, which just outputs how many insert/update/delete operations you would have to perform (characterwise) in order to transform one string into another. Apache's StringUtils class has an implementation.

like image 33
soulmerge Avatar answered Sep 20 '22 19:09

soulmerge