Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java: how to find the most probable string in a list of strings?

I have a list of strings in Java containing first name of a person with dissimilar spellings (not entirely different). For example, John may be spelled as Jon, Jawn, Jaun etc. How should I retrieve the most appropriate string in this list. If anyone can suggest a method how to use Soundex in this case, it shall be of great help.

like image 965
jigsawmnc Avatar asked Sep 29 '12 06:09

jigsawmnc


2 Answers

You have use approximate string matching algorithm , There are several strategies to implement this . Blur is a Trie-based Java implementation of approximate string matching based on the Levenshtein word distance.

There is another strategy to implement its called boyer-moore approximate string matching algorithm.

The usual approach to solve these problem using this algorithm and Levenshtein word distance is to compare the input to the possible outputs and choose the one with the smallest distance to the desired output.

like image 197
Aravind.HU Avatar answered Sep 30 '22 19:09

Aravind.HU


There is one jar file for matching approximate string..

go through link and download frej.jar

http://sourceforge.net/projects/frej/files/

there is one method inside this jar file

Fuzzy.equals("jon","john");

it will return true in this type of approximate string.

like image 28
Dipen Jogi Avatar answered Sep 30 '22 19:09

Dipen Jogi