Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiently identifying whether part of string is in list/dict keys? [closed]

I have a lot (>100,000) lowercase strings in a list, where a subset might look like this:

str_list = ["hello i am from denmark", "that was in the united states", "nothing here"]

I further have a dict like this (in reality this is going to have a length of around ~1000):

dict_x = {"denmark" : "dk", "germany" : "ger", "norway" : "no", "united states" : "us"}

For all strings in the list which contain any of the dict's keys, I want to replace the entire string with the corresponding dict value. The expected result should thus be:

str_list = ["dk", "us", "nothing here"]

What is the most efficient way to do this given the number of strings I have and the length of the dict?

Extra info: There is never more than one dict key in a string.

like image 306
Emjora Avatar asked Mar 06 '23 14:03

Emjora


1 Answers

This seems to be a good way:

input_strings = ["hello i am from denmark",
                 "that was in the united states",
                 "nothing here"]
dict_x = {"denmark" : "dk", "germany" : "ger", "norway" : "no", "united states" : "us"}

output_strings = []

for string in input_strings:
    for key, value in dict_x.items():
        if key in string:
            output_strings.append(value)
            break
    else:
        output_strings.append(string)
print(output_strings)
like image 70
mrCarnivore Avatar answered Apr 10 '23 11:04

mrCarnivore