Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python search technology: word similarity

I want to get a similarity percentage of two words, eg)

abcd versus zzabcdzz == 50% similarity

Don't need to be very accurate. Is there any way to do that? I am using python but feel free to recomment other languages.

like image 659
Bin Chen Avatar asked May 03 '26 04:05

Bin Chen


1 Answers

Try using python-Levenshtein to calculate the edit distance.

The Levenshtein Python C extension module contains functions for fast computation of

  • Levenshtein (edit) distance, and edit operations
  • string similarity
  • approximate median strings, and generally string averaging
  • string sequence and set similarity

You can get a rough idea of similarity by calculating the edit distance between the two strings divided by the length of the longest string. In your example the edit distance is 4, and the maximum possible edit distance is 8, so the similarity is 50%.

like image 127
Mark Byers Avatar answered May 05 '26 16:05

Mark Byers