I have a string that contains (exclusively) one of several substrings. I want to check which substring is contained and get a value that associated to it. This is why I would do this operation with a dictionary.
Example:
string_to_check = 'TEST13-872B-A22E'
substrings = {'TEST': 0, 'WORLD': 1, 'CORONA':2}
In this case, 0 should be returned.
The background is that I have a pandas DataFrame (df) with a column string_to_check full of these strings. Based on which substring is contained in each row, I want to assign a value to the respective row of a new column of the dataframe.
Example result:
string_to_check result
'TEST13-872B-A22E' 0
'CORONA1-241-22E' 2
'TEST32-33A-442' 0
'WORLD4-BB2-A343' 1
I guess I could use something along the lines of
def check_string(string_to_check):
for stri, val in zip(substrings.keys, substrings.values):
if stri in string_to_check:
return val
combined with apply. But at the moment I feel to stupid to put the pieces together by myself.
EDIT:
Okay I think I solved this myself:
def check_string(string_to_check):
for stri, val in zip(substrings.keys(), substrings.values()):
if stri in string_to_check:
return val
df['result'] = df['string_to_check'].apply(check_string)
But I am happy to see further suggestions for shorter / more readable / more pythonic ways of doing this.
Only few advices.
Firstly, in your code you can substitute zip(substrings.keys(), substrings.values()) with items method of dict class substrings.items().
Then, if you want you can a lambda function inside the apply method. This lambda function produces the desired output
lambda x: [val for key, val in substrings.items() if key in x][0]
Be careful that if no substring is present in the string_to_check the function raise an error due to the [0].
df['result'] = df['string_to_check'].apply(lambda x: [val for key, val in substrings.items() if key in x][0])
For the first question, use a dict comprehension to iterate over the key, value pairs obtained from dict.items(), and check if the key is in the dict:
>>> string_to_check = 'TEST13-872B-A22E'
>>> substrings = {'TEST': 0, 'WORLD': 1, 'CORONA':2}
>>> [val for key, val in substrings.items() if key in string_to_check]
[0]
But for your actual problem, you can use str.join to join the dict.keys() with | character to pandas.str.Series.extract the dict.values() from substrings, then pandas.Series.map the result to substrings:
>>> df
string_to_check
0 'TEST13-872B-A22E'
1 'CORONA1-241-22E'
2 'TEST32-33A-442'
3 'WORLD4-BB2-A343'
>>> df.assign(result=
df.string_to_check
.str.extract(f"({'|'.join(substrings.keys())})", expand=False)
.map(substrings))
string_to_check result
0 'TEST13-872B-A22E' 0
1 'CORONA1-241-22E' 2
2 'TEST32-33A-442' 0
3 'WORLD4-BB2-A343' 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With