Is there a standard linux command for it? If not, can anyone describe a python script to do the same?
It depends. When the ocr Outputs are similar and there are one few differences to expect, yout could do a "split" and compare each word/line etc. And only use levenshtein distance for the part in wich diferences occur when the amount of lines are the same. eg:
def textLevi(txt1,txt2):
lines = list(zip(txt1.split("\n"),txt2.split("\n")))
distance = 0
for i,ele in enumerate(lines,1):
line1,line2 = ele
if line1 != line2:
actDistance = distance(line1,line2)
print( "Distance of line %d: " %(i),actDistance)
distance += actDistance
print( "Sum of Lv Distances:",distance)
textLevi("Hello I \n like cheese","Hello I \n like cheddar")
would create the Output:
Distance of line 2: 4
Sum of Lv Distances: 4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With