Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Levenshtein distance on non-English strings

Will the Levenshtein distance algorithm work well for non-English language strings too?

Update: Would this work automatically in a language like Java when comparing Asian characters?

like image 376
Ryan Fernandes Avatar asked Feb 17 '10 11:02

Ryan Fernandes


2 Answers

Only if language is letter based. For example Russian, German,... but hieroglyph (China for example) or syllable (like Laos) - not.

like image 186
Dewfy Avatar answered Oct 25 '22 20:10

Dewfy


Yes. But you have to treat the non-english characters as "1 character", not as multiple characters (for example with utf-8). For example, in python you would use the unicode class to represent the string (and characters).

like image 26
ondra Avatar answered Oct 25 '22 19:10

ondra