I have a pandas dataframe which I want to check for substrings of a certain column. At the moment I have 30 lines of code of this kind:
df['NAME'].str.upper().str.contains('LIMITED')) |
(df['NAME'].str.upper().str.contains('INC')) |
(df['NAME'].str.upper().str.contains('CORP'))
They are all linked with an or
condition and if any of them is true, the name is the name of a company rather than a person.
But to me this doesn't seem very elegant. Is there a way to check a pandas string column for "does the string in this column contain any of the substrings in the following list" ['LIMITED', 'INC', 'CORP']
.
I found the pandas.DataFrame.isin function, but this is only working for entire strings, not for my substrings.
You can use regex, where '|' is an "or" in regular expressions:
l = ['LIMITED','INC','CORP']
regstr = '|'.join(l)
df['NAME'].str.upper().str.contains(regstr)
MVCE:
In [1]: import pandas as pd
In [2]: df = pd.DataFrame({'NAME':['Baby CORP.','Baby','Baby INC.','Baby LIMITED
...: ']})
In [3]: df
Out[3]:
NAME
0 Baby CORP.
1 Baby
2 Baby INC.
3 Baby LIMITED
In [4]: l = ['LIMITED','INC','CORP']
...: regstr = '|'.join(l)
...: df['NAME'].str.upper().str.contains(regstr)
...:
Out[4]:
0 True
1 False
2 True
3 True
Name: NAME, dtype: bool
In [5]: regstr
Out[5]: 'LIMITED|INC|CORP'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With