fastest way to do fuzzy matching two strings in pandas data frame

Question

I have two data frames with name list

df1[name]   -> number of rows 3000

df2[name]   -> number of rows 64000

I am using fuzzy wuzzy to get the best match for df1 entries from df2 using the following code:

from fuzzywuzzy import fuzz
from fuzzywuzzy import process

matches = [process.extract(x, df1, limit=1) for x in df2]

But this is taking forever to finish. Is there any faster way to do the fuzzy matching of strings in pandas?

StatguyUser · Accepted Answer

One improvement i can see in your code is to use generator, so instead of square brackets, you can use round brackets. it will increase the speed by multiple time.

matches = (process.extract(x, df1, limit=1) for x in df2)

Edit: One more suggestion, we can parallelize the operation with multiprocessing library.

fastest way to do fuzzy matching two strings in pandas data frame

Tags:

kunal deep

1 Answers

StatguyUser

Recent Activity

Donate For Us

fastest way to do fuzzy matching two strings in pandas data frame

Tags:

kunal deep

1 Answers

StatguyUser

Related questions

Recent Activity

Donate For Us