I have found some Python
implementations of the Levenshtein distance.
I am wondering though how these algorithms can be efficiently modified so that they break if the Levenshtein distance is greater than n
(e.g. 3) instead of running until the end?
So essentially I do not want to let the algorithm run for too long to calculate the final distance if I simply want to know if the distance is greater than a threshold or not.
I have found some relevant posts here:
but still, I do not see any Python code which does what I describe above (which is more or less what these posts describe too).
P.S.: The solution provided by @amirouche below is based on the fastest implementation that I have tested with some benchmarking (from here: https://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#Python, https://stackoverflow.com/a/32558749/9024698) and its bounded version is the fastest one of its kind from my tests (without excluding that there may be even faster ones).
As described in Levenstein distance limit, you can add a test over the row that is computed to return early:
def levenshtein(s1, s2, maximum):
if len(s1) > len(s2):
s1, s2 = s2, s1
distances = range(len(s1) + 1)
for i2, c2 in enumerate(s2):
distances_ = [i2+1]
for i1, c1 in enumerate(s1):
if c1 == c2:
distances_.append(distances[i1])
else:
distances_.append(1 + min((distances[i1], distances[i1 + 1], distances_[-1])))
if all((x >= maximum for x in distances_)):
return False
distances = distances_
return distances[-1]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With