I want to extend the question asked here
The solutions in the above question return True or False. And the boolean values can be used to subset the right values.
However, I want to get the search value that matched a substring.
For example,(borrowing from the above question)
s = pd.Series(['cat','hat','dog','fog','pet'])
searchfor = ['og', 'at']
I want to know that 'cat' matched with 'at' and dog matched with 'og'
IIUC, you want the values to reflect the index of the item in the searchfor
list that matched your word. You can start by modifying your searchfor
object -
m = {'^.*{}.*$'.format(s) : str(i) for i, s in enumerate(searchfor)}
This is a dictionary of <pattern : index>
mappings. Now, call pd.Series.replace
with regex=True
-
s = s.replace(m, regex=True)
s[:] = np.where(s.str.isdigit(), pd.to_numeric(s, errors='coerce'), -1)
s
0 1
1 1
2 0
3 0
4 -1
dtype: int64
If you want a list of matched values by pattern, you'll need str.extract
+ groupby
+ apply
-
p = '(^.*({}).*$)'.format('|'.join(searchfor))
s.str.extract(p, expand=True)\
.groupby([1])[0]\
.apply(list)
1
at [cat, hat]
og [dog, fog]
Name: 0, dtype: object
This is by using defaultdict
+ replace
finally I made it ..
d=dict(zip(searchfor,[""]*2))
s1=s.replace(d,regex=True)
import collections
d = collections.defaultdict(dict)
for x,y in zip(s1.index,s1):
d[x][y]=''
s.to_frame('a').T.replace(dict(d), regex=True).T.a
Out[765]:
0 at
1 at
2 og
3 og
4
Name: a, dtype: object
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With