I have 2 pandas DataFrames. One containing a list of properly spelled words:
[In]: df1
[Out]:
words
0 apple
1 phone
2 clock
3 table
4 clean
and one with misspelled words:
[In]: df2
[Out]:
misspelled
0 aple
1 phn
2 alok
3 garbage
4 appl
5 pho
The goal is to replace the column of misspelled words in the second DataFrame using the list of correctly spelled words from the first DataFrame. The second DataFrame can have multiple repetitions, can be a different size than the first, can have words that aren't in the first DataFrame (or aren't similar enough to match).
I've been trying to use difflib.get_close_matches
with some success, but it does not work out perfectly.
This is what I have so far:
x = list(map(lambda x: get_close_matches(x, df1.col1), df2.col1))
good_words = list(map(''.join, x))
l = np.array(good_words, dtype='object')
df2.col1 = pd.Series(l)
df2 = df2[df2.col1 != '']
After applying the transformation, I should get the second DataFrame to look like:
[In]: df2
[Out]:
0
0 apple
1 phone
2 clock
3 NaN
4 apple
5 phone
If no match is found the row gets replaced with NaN
. My problem is that I get a result that looks like this:
[In]: df2
[Out]:
misspelled
0 apple
1 phone
2 clockclean
3 NaN
4 apple
5 phone
At this time of writing I have not figured out why some of the words are combined. I suspect it has something to do with difflib.get_close_matches
matching different words that are similar in length and/or lettering. So far I get aroun ~10% - 15% of the words combined like this out of a whole column.
Thanks in advance.
If want match first value returned by get_close_matches
, the cutoff parameter can be adjusted based on your desired threshold, use next
with iter
for possible add value if no match - here np.nan
:
x = [next(iter(x), np.nan)
for x in map(lambda x: difflib.get_close_matches(x, df1.words, cutoff = 0.6), df2.misspelled)]
df2['col1'] = x
print (df2)
misspelled col1
0 aple apple
1 phn phone
2 alok clock
3 garbage NaN
4 appl apple
5 pho phone
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With