Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Search and return index of matching substring with pandas

I want to extend the question asked here

The solutions in the above question return True or False. And the boolean values can be used to subset the right values.

However, I want to get the search value that matched a substring.

For example,(borrowing from the above question)

s = pd.Series(['cat','hat','dog','fog','pet'])
searchfor = ['og', 'at']

I want to know that 'cat' matched with 'at' and dog matched with 'og'

like image 953
Sharvari Gc Avatar asked Mar 06 '23 23:03

Sharvari Gc


2 Answers

IIUC, you want the values to reflect the index of the item in the searchfor list that matched your word. You can start by modifying your searchfor object -

m = {'^.*{}.*$'.format(s) : str(i) for i, s in enumerate(searchfor)}

This is a dictionary of <pattern : index> mappings. Now, call pd.Series.replace with regex=True -

s = s.replace(m, regex=True)
s[:] = np.where(s.str.isdigit(), pd.to_numeric(s, errors='coerce'), -1)

s

0    1
1    1
2    0
3    0
4   -1
dtype: int64

If you want a list of matched values by pattern, you'll need str.extract + groupby + apply -

p = '(^.*({}).*$)'.format('|'.join(searchfor))

s.str.extract(p, expand=True)\
 .groupby([1])[0]\
 .apply(list)

1
at    [cat, hat]
og    [dog, fog]
Name: 0, dtype: object
like image 121
cs95 Avatar answered Mar 23 '23 10:03

cs95


This is by using defaultdict + replace finally I made it ..

d=dict(zip(searchfor,[""]*2))

s1=s.replace(d,regex=True)
import collections
d = collections.defaultdict(dict)
for x,y in zip(s1.index,s1):
    d[x][y]=''

s.to_frame('a').T.replace(dict(d), regex=True).T.a


Out[765]: 
0    at
1    at
2    og
3    og
4      
Name: a, dtype: object
like image 23
BENY Avatar answered Mar 23 '23 11:03

BENY