Beginner with python - I'm looking to create a dictionary mapping of strings, and the associated value. I have a dataframe and would like create a new column where if the string matches, it tags the column as x.
df = pd.DataFrame({'comp':['dell notebook', 'dell notebook S3', 'dell notepad', 'apple ipad', 'apple ipad2', 'acer chromebook', 'acer chromebookx', 'mac air', 'mac pro', 'lenovo x4'],
              'price':range(10)})
For Example I would like to take the above df and create a new column df['company'] and set it to a mapping of strings.
I was thinking of doing something like
product_map = {'dell':'Dell Inc.',
               'apple':'Apple Inc.',
               'acer': 'Acer Inc.',
               'mac': 'Apple Inc.',
               'lenovo': 'Dell Inc.'}
Then I wanted to iterate through it to check the df.comp column and see if each entry contained one of those strings, and to set the df.company column to the value in the dictionary.
Not sure how to do this correctly though.
There are many ways to do this. One way to do it would be the following:
def like_function(x):
    group = "unknown"
    for key in product_map:
        if key in x:
            group = product_map[key]
            break
    return group
df['company'] = df.comp.apply(like_function)
                        Here is an interesting way, especially if you are learning about python. You can subclass dict and override __getitem__ to look for partial strings.
class dict_partial(dict):
    def __getitem__(self, value):
        for k in self.keys():
            if k in value:
                return self.get(k)
        else:
            return self.get(None)
product_map = dict_partial({'dell':'Dell Inc.', 'apple':'Apple Inc.',
                            'acer': 'Acer Inc.', 'mac': 'Apple Inc.',
                            'lenovo': 'Dell Inc.'})
df['company'] = df['comp'].apply(lambda x: product_map[x])
               comp  price     company
# 0     dell notebook      0   Dell Inc.
# 1  dell notebook S3      1   Dell Inc.
# 2      dell notepad      2   Dell Inc.
# 3        apple ipad      3  Apple Inc.
# 4       apple ipad2      4  Apple Inc.
# 5   acer chromebook      5   Acer Inc.
# 6  acer chromebookx      6   Acer Inc.
# 7           mac air      7  Apple Inc.
# 8           mac pro      8  Apple Inc.
# 9         lenovo x4      9   Dell Inc.
My only annoyance with this method is that subclassing dict does't override dict.get at the same time as [] syntax. If this were possible, we could get rid of the lambda and use df['comp'].map(product_map.get). There doesn't seem to be an obvious solution to this.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With