Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

check element-wise for existence of string

I'm looking for a way to check whether one string can be found in another string. str.contains only takes a fixed string pattern as argument, I'd rather like to have an element-wise comparison between two string columns.

import pandas as pd

df = pd.DataFrame({'long': ['sometext', 'someothertext', 'evenmoretext'],
               'short': ['some', 'other', 'stuff']})


# This fails:
df['short_in_long'] = df['long'].str.contains(df['short'])

Expected Output:

[True, True, False]
like image 365
E. Sommer Avatar asked Mar 15 '19 13:03

E. Sommer


3 Answers

Use list comprehension with zip:

df['short_in_long'] = [b in a for a, b in zip(df['long'], df['short'])]

print (df)
            long  short  short_in_long
0       sometext   some           True
1  someothertext  other           True
2   evenmoretext  stuff          False
like image 161
jezrael Avatar answered Oct 21 '22 07:10

jezrael


This is a prime use case for a list comprehension:

# df['short_in_long'] = [y in x for x, y in df[['long', 'short']].values.tolist()]
df['short_in_long'] = [y in x for x, y in df[['long', 'short']].values]
df

            long  short  short_in_long
0       sometext   some           True
1  someothertext  other           True
2   evenmoretext  stuff          False

List comprehensions are usually faster than string methods because of lesser overhead. See For loops with pandas - When should I care?.


If your data contains NaNs, you can call a function with error handling:

def try_check(haystack, needle):
    try:
        return needle in haystack
    except TypeError:
        return False

df['short_in_long'] = [try_check(x, y) for x, y in df[['long', 'short']].values]
like image 27
cs95 Avatar answered Oct 21 '22 05:10

cs95


Check with numpy, it is row-wise :-) .

np.core.char.find(df.long.values.astype(str),df.short.values.astype(str))!=-1
Out[302]: array([ True,  True, False])
like image 3
BENY Avatar answered Oct 21 '22 07:10

BENY