I'm trying to calculate the Levenshtein distance between two Pandas columns but I'm getting stuck Here is the library I'm using. Here is a minimal, reproducible example:
import pandas as pd
from textdistance import levenshtein
attempts = [['passw0rd', 'pasw0rd'],
['passwrd', 'psword'],
['psw0rd', 'passwor']]
df=pd.DataFrame(attempts, columns=['password', 'attempt'])
password attempt
0 passw0rd pasw0rd
1 passwrd psword
2 psw0rd passwor
My poor attempt:
df.apply(lambda x: levenshtein.distance(*zip(x['password'] + x['attempt'])), axis=1)
This is how the function works. It takes two strings as arguments:
levenshtein.distance('helloworld', 'heloworl')
Out[1]: 2
The Levenshtein distance is usually calculated by preparing a matrix of size (M+1)x(N+1) —where M and N are the lengths of the 2 words—and looping through said matrix using 2 for loops, performing some calculations within each iteration.
Levenshtein distance between two strings is defined as the minimum number of characters needed to insert, delete or replace in a given string string1 to transform it to another string string2. Explanation : We can convert string1 into str2 by inserting a 's'.
Maybe I'm missing something, is there a reason you don't like the lambda expression? This works to me:
import pandas as pd
from textdistance import levenshtein
attempts = [['passw0rd', 'pasw0rd'],
['passwrd', 'psword'],
['psw0rd', 'passwor'],
['helloworld', 'heloworl']]
df=pd.DataFrame(attempts, columns=['password', 'attempt'])
df.apply(lambda x: levenshtein.distance(x['password'], x['attempt']), axis=1)
out:
0 1
1 3
2 4
3 2
dtype: int64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With