Python

Question

I have a string that contains (exclusively) one of several substrings. I want to check which substring is contained and get a value that associated to it. This is why I would do this operation with a dictionary.

Example:

string_to_check = 'TEST13-872B-A22E'
substrings = {'TEST': 0, 'WORLD': 1, 'CORONA':2}

In this case, 0 should be returned.

The background is that I have a pandas DataFrame (df) with a column string_to_check full of these strings. Based on which substring is contained in each row, I want to assign a value to the respective row of a new column of the dataframe.

Example result:

string_to_check       result

'TEST13-872B-A22E'    0
'CORONA1-241-22E'     2
'TEST32-33A-442'      0
'WORLD4-BB2-A343'     1

I guess I could use something along the lines of

def check_string(string_to_check):
    for stri, val in zip(substrings.keys, substrings.values):
        if stri in string_to_check:
            return val

combined with apply. But at the moment I feel to stupid to put the pieces together by myself.

EDIT:

Okay I think I solved this myself:

def check_string(string_to_check):
    for stri, val in zip(substrings.keys(), substrings.values()):
        if stri in string_to_check:
            return val

df['result'] = df['string_to_check'].apply(check_string)

But I am happy to see further suggestions for shorter / more readable / more pythonic ways of doing this.

Sankios · Accepted Answer

Only few advices.

Firstly, in your code you can substitute zip(substrings.keys(), substrings.values()) with items method of dict class substrings.items().

Then, if you want you can a lambda function inside the apply method. This lambda function produces the desired output

lambda x: [val for key, val in substrings.items() if key in x][0]

Be careful that if no substring is present in the string_to_check the function raise an error due to the [0].

df['result'] = df['string_to_check'].apply(lambda x: [val for key, val in substrings.items() if key in x][0])

Sayandip Dutta · Answer

For the first question, use a dict comprehension to iterate over the key, value pairs obtained from dict.items(), and check if the key is in the dict:

>>> string_to_check = 'TEST13-872B-A22E'
>>> substrings = {'TEST': 0, 'WORLD': 1, 'CORONA':2}

>>> [val for key, val in substrings.items() if key in string_to_check]
 [0]

But for your actual problem, you can use str.join to join the dict.keys() with | character to pandas.str.Series.extract the dict.values() from substrings, then pandas.Series.map the result to substrings:

>>> df
      string_to_check
0  'TEST13-872B-A22E'
1   'CORONA1-241-22E'
2    'TEST32-33A-442'
3   'WORLD4-BB2-A343'

>>> df.assign(result=
           df.string_to_check
             .str.extract(f"({'|'.join(substrings.keys())})", expand=False)
             .map(substrings))

      string_to_check  result
0  'TEST13-872B-A22E'       0
1   'CORONA1-241-22E'       2
2    'TEST32-33A-442'       0
3   'WORLD4-BB2-A343'       1

Python - Get dictionary value if key is contained in a string

Tags:

string

pandas

flurble

2 Answers

Sankios

Sayandip Dutta

Recent Activity

Donate For Us