Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

map str.contains across pandas DataFrame

Beginner with python - I'm looking to create a dictionary mapping of strings, and the associated value. I have a dataframe and would like create a new column where if the string matches, it tags the column as x.

df = pd.DataFrame({'comp':['dell notebook', 'dell notebook S3', 'dell notepad', 'apple ipad', 'apple ipad2', 'acer chromebook', 'acer chromebookx', 'mac air', 'mac pro', 'lenovo x4'],
              'price':range(10)})

For Example I would like to take the above df and create a new column df['company'] and set it to a mapping of strings.

I was thinking of doing something like

product_map = {'dell':'Dell Inc.',
               'apple':'Apple Inc.',
               'acer': 'Acer Inc.',
               'mac': 'Apple Inc.',
               'lenovo': 'Dell Inc.'}

Then I wanted to iterate through it to check the df.comp column and see if each entry contained one of those strings, and to set the df.company column to the value in the dictionary.

Not sure how to do this correctly though.

like image 428
Matt W. Avatar asked Feb 02 '18 20:02

Matt W.


2 Answers

There are many ways to do this. One way to do it would be the following:

def like_function(x):
    group = "unknown"
    for key in product_map:
        if key in x:
            group = product_map[key]
            break
    return group

df['company'] = df.comp.apply(like_function)
like image 151
aquil.abdullah Avatar answered Oct 06 '22 00:10

aquil.abdullah


Here is an interesting way, especially if you are learning about python. You can subclass dict and override __getitem__ to look for partial strings.

class dict_partial(dict):
    def __getitem__(self, value):
        for k in self.keys():
            if k in value:
                return self.get(k)
        else:
            return self.get(None)

product_map = dict_partial({'dell':'Dell Inc.', 'apple':'Apple Inc.',
                            'acer': 'Acer Inc.', 'mac': 'Apple Inc.',
                            'lenovo': 'Dell Inc.'})

df['company'] = df['comp'].apply(lambda x: product_map[x])

               comp  price     company
# 0     dell notebook      0   Dell Inc.
# 1  dell notebook S3      1   Dell Inc.
# 2      dell notepad      2   Dell Inc.
# 3        apple ipad      3  Apple Inc.
# 4       apple ipad2      4  Apple Inc.
# 5   acer chromebook      5   Acer Inc.
# 6  acer chromebookx      6   Acer Inc.
# 7           mac air      7  Apple Inc.
# 8           mac pro      8  Apple Inc.
# 9         lenovo x4      9   Dell Inc.

My only annoyance with this method is that subclassing dict does't override dict.get at the same time as [] syntax. If this were possible, we could get rid of the lambda and use df['comp'].map(product_map.get). There doesn't seem to be an obvious solution to this.

like image 43
jpp Avatar answered Oct 06 '22 01:10

jpp