I have two string columns in my Pandas dataset
name1     name2
John Doe  John Doe
AleX T    Franz K
and I need to check whether name1 equals name2.
The naive way I use now is using a simple mask
mask=df.name1==df.name2
But the problem is that there may be mislabeled strings (in a way that is not predictable - the data is too big) that prevent an exact matching to occur.
For instance "John Doe" and "John Doe " would not match. Of course, I trimmed, lower-cased my strings but other possibilities remain.
One idea would be to look whether name1 is contained in name2. But it seems I cannot use str.contains with another variable as argument. Any other ideas?
Many thanks!
EDIT: using isin gives non-sensical results.
Example
test = pd.DataFrame({'A': ["john doe", " john doe", 'John'], 'B': [' john doe', 'eddie murphy', 'batman']})
test
Out[6]: 
           A             B
0   john doe      john doe
1   john doe  eddie murphy
2       John        batman
test['A'].isin(test['B'])
Out[7]: 
0    False
1     True
2    False
Name: A, dtype: bool
                I think you can use str.lower and str.replace with arbitrary whitespace s/+:
test = pd.DataFrame({'A': ["john  doe", " john doe", 'John'], 
                     'B': [' john doe', 'eddie murphy', 'batman']})
print test['A'].str.lower().str.replace('s/+',"") == 
      test['B'].str.strip().str.replace('s/+',"")
0     True
1    False
2    False
dtype: bool
                        strip the spaces and lower the case:
In [414]:
test['A'].str.strip().str.lower() == test['B'].str.strip().str.lower()
Out[414]:
0     True
1    False
2    False
dtype: bool
                        You can use difflib to compute distance
import difflib as dfl
dfl.SequenceMatcher(None,'John Doe', 'John doe').ratio()
edit : integration with Pandas :
import pandas as pd
import difflib as dfl
df = pd.DataFrame({'A': ["john doe", " john doe", 'John'], 'B': [' john doe', 'eddie murphy', 'batman']})
df['VAR1'] = df.apply(lambda x : dfl.SequenceMatcher(None, x['A'], x['B']).ratio(),axis=1)
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With