Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Levenshtein algorithm with custom character mapping

I want to use Levenshtein algorithm to search in a list of strings. I want to implement a custom character mapping in order to type latin characters and searching in items in greek.

mapping example:

a = α, ά
b = β
i = ι,ί,ΐ,ϊ
... (etc)
u = ου, ού

So searching using abu in a list with

  • αbu
  • abού
  • αού (all greek characters)

will result with all items in the list. (item order is not a problem)

How do I apply a mapping in the algorithm? (this is where I start)

like image 907
Odys Avatar asked Mar 22 '12 11:03

Odys


1 Answers

I think the best way would be to preprocess your symbols to one definite form (e.g. all in latin) and then use Levenshtein as you would do normaly.

In pseudocode:

int func(String latinStr, String greekStr) {
   String mappedStr = convertToLatin(greekStr); // e.g. now αβ would be ab 
   return Levenstein(latinStr, mappedStr);
}

And in convertToLatin you may symbol-by-symbol ask Dictionary with mappings for a replace and construct new string

like image 157
om-nom-nom Avatar answered Nov 08 '22 02:11

om-nom-nom