I'm looking for a way to check whether one string can be found in another string. str.contains
only takes a fixed string pattern as argument, I'd rather like to have an element-wise comparison between two string columns.
import pandas as pd
df = pd.DataFrame({'long': ['sometext', 'someothertext', 'evenmoretext'],
'short': ['some', 'other', 'stuff']})
# This fails:
df['short_in_long'] = df['long'].str.contains(df['short'])
Expected Output:
[True, True, False]
Use list comprehension with zip
:
df['short_in_long'] = [b in a for a, b in zip(df['long'], df['short'])]
print (df)
long short short_in_long
0 sometext some True
1 someothertext other True
2 evenmoretext stuff False
This is a prime use case for a list comprehension:
# df['short_in_long'] = [y in x for x, y in df[['long', 'short']].values.tolist()]
df['short_in_long'] = [y in x for x, y in df[['long', 'short']].values]
df
long short short_in_long
0 sometext some True
1 someothertext other True
2 evenmoretext stuff False
List comprehensions are usually faster than string methods because of lesser overhead. See For loops with pandas - When should I care?.
If your data contains NaNs, you can call a function with error handling:
def try_check(haystack, needle):
try:
return needle in haystack
except TypeError:
return False
df['short_in_long'] = [try_check(x, y) for x, y in df[['long', 'short']].values]
Check with numpy
, it is row-wise :-) .
np.core.char.find(df.long.values.astype(str),df.short.values.astype(str))!=-1
Out[302]: array([ True, True, False])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With