How to calculate Levenshtein Distance matrix of strings in Python
              str1    str2    str3    str4    ...     strn
      str1    0.8     0.4     0.6     0.1     ...     0.2
      str2    0.4     0.7     0.5     0.1     ...     0.1
      str3    0.6     0.5     0.6     0.1     ...     0.1
      str4    0.1     0.1     0.1     0.5     ...     0.6
      .       .       .       .       .       ...     .
      .       .       .       .       .       ...     .
      .       .       .       .       .       ...     .
      strn    0.2     0.1     0.1     0.6     ...     0.7
Using Distance function we can calculate distance between 2 words. But here I have 1 list containing n number of strings. I wanted to calculate the distance matrix and after that I want to do clustering of words.
Here is my code
import pandas as pd
from Levenshtein import distance
import numpy as np
Target = ['Tree','Trip','Treasure','Nothingtodo']
List1 = Target
List2 = Target
Matrix = np.zeros((len(List1),len(List2)),dtype=np.int)
for i in range(0,len(List1)):
  for j in range(0,len(List2)):
      Matrix[i,j] = distance(List1[i],List2[j])
print Matrix
[[ 0  2  4 11]
 [ 2  0  6 10]
 [ 4  6  0 11]
 [11 10 11  0]]
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With