Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

String Distance Matrix in Python

How to calculate Levenshtein Distance matrix of strings in Python

              str1    str2    str3    str4    ...     strn
      str1    0.8     0.4     0.6     0.1     ...     0.2
      str2    0.4     0.7     0.5     0.1     ...     0.1
      str3    0.6     0.5     0.6     0.1     ...     0.1
      str4    0.1     0.1     0.1     0.5     ...     0.6
      .       .       .       .       .       ...     .
      .       .       .       .       .       ...     .
      .       .       .       .       .       ...     .
      strn    0.2     0.1     0.1     0.6     ...     0.7

Using Distance function we can calculate distance between 2 words. But here I have 1 list containing n number of strings. I wanted to calculate the distance matrix and after that I want to do clustering of words.

like image 424
Ajay Jadhav Avatar asked May 25 '16 06:05

Ajay Jadhav


1 Answers

Here is my code

import pandas as pd
from Levenshtein import distance
import numpy as np

Target = ['Tree','Trip','Treasure','Nothingtodo']

List1 = Target
List2 = Target

Matrix = np.zeros((len(List1),len(List2)),dtype=np.int)

for i in range(0,len(List1)):
  for j in range(0,len(List2)):
      Matrix[i,j] = distance(List1[i],List2[j])

print Matrix

[[ 0  2  4 11]
 [ 2  0  6 10]
 [ 4  6  0 11]
 [11 10 11  0]]
like image 122
pratiksha Avatar answered Sep 16 '22 11:09

pratiksha