Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I calculate the Levenshtein distance between two Pandas DataFrame columns?

I'm trying to calculate the Levenshtein distance between two Pandas columns but I'm getting stuck Here is the library I'm using. Here is a minimal, reproducible example:

import pandas as pd
from textdistance import levenshtein

attempts = [['passw0rd', 'pasw0rd'],
            ['passwrd', 'psword'],
            ['psw0rd', 'passwor']]

df=pd.DataFrame(attempts, columns=['password', 'attempt'])
   password  attempt
0  passw0rd  pasw0rd
1   passwrd   psword
2    psw0rd  passwor

My poor attempt:

df.apply(lambda x: levenshtein.distance(*zip(x['password'] + x['attempt'])), axis=1)

This is how the function works. It takes two strings as arguments:

levenshtein.distance('helloworld', 'heloworl')
Out[1]: 2
like image 466
Nicolas Gervais Avatar asked Jan 31 '20 15:01

Nicolas Gervais


People also ask

How is Levenshtein distance calculated?

The Levenshtein distance is usually calculated by preparing a matrix of size (M+1)x(N+1) —where M and N are the lengths of the 2 words—and looping through said matrix using 2 for loops, performing some calculations within each iteration.

Where is Levenshtein distance in Python?

Levenshtein distance between two strings is defined as the minimum number of characters needed to insert, delete or replace in a given string string1 to transform it to another string string2. Explanation : We can convert string1 into str2 by inserting a 's'.


1 Answers

Maybe I'm missing something, is there a reason you don't like the lambda expression? This works to me:

import pandas as pd
from textdistance import levenshtein

attempts = [['passw0rd', 'pasw0rd'],
            ['passwrd', 'psword'],
            ['psw0rd', 'passwor'],
            ['helloworld', 'heloworl']]

df=pd.DataFrame(attempts, columns=['password', 'attempt'])

df.apply(lambda x: levenshtein.distance(x['password'],  x['attempt']), axis=1)

out:

0    1
1    3
2    4
3    2
dtype: int64
like image 173
Andrea Avatar answered Nov 14 '22 22:11

Andrea