Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Calculate Levenshtein distance between two .txt files? [closed]

Is there a standard linux command for it? If not, can anyone describe a python script to do the same?

like image 476
blastoise Avatar asked Oct 19 '25 20:10

blastoise


1 Answers

It depends. When the ocr Outputs are similar and there are one few differences to expect, yout could do a "split" and compare each word/line etc. And only use levenshtein distance for the part in wich diferences occur when the amount of lines are the same. eg:

def textLevi(txt1,txt2):
   lines = list(zip(txt1.split("\n"),txt2.split("\n")))
   distance = 0
   for i,ele in enumerate(lines,1):
        line1,line2 = ele
       if line1 != line2:
           actDistance = distance(line1,line2)
           print( "Distance of line %d: " %(i),actDistance)
           distance += actDistance


   print( "Sum of Lv Distances:",distance)
 
textLevi("Hello I \n like cheese","Hello I \n like cheddar")

would create the Output:

Distance of line 2: 4

Sum of Lv Distances: 4

like image 131
Luxusproblem Avatar answered Oct 22 '25 12:10

Luxusproblem



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!