Beginner with python - I'm looking to create a dictionary mapping of strings, and the associated value. I have a dataframe and would like create a new column where if the string matches, it tags the column as x.
df = pd.DataFrame({'comp':['dell notebook', 'dell notebook S3', 'dell notepad', 'apple ipad', 'apple ipad2', 'acer chromebook', 'acer chromebookx', 'mac air', 'mac pro', 'lenovo x4'],
'price':range(10)})
For Example I would like to take the above df
and create a new column df['company']
and set it to a mapping of strings.
I was thinking of doing something like
product_map = {'dell':'Dell Inc.',
'apple':'Apple Inc.',
'acer': 'Acer Inc.',
'mac': 'Apple Inc.',
'lenovo': 'Dell Inc.'}
Then I wanted to iterate through it to check the df.comp
column and see if each entry contained one of those strings, and to set the df.company
column to the value in the dictionary.
Not sure how to do this correctly though.
There are many ways to do this. One way to do it would be the following:
def like_function(x):
group = "unknown"
for key in product_map:
if key in x:
group = product_map[key]
break
return group
df['company'] = df.comp.apply(like_function)
Here is an interesting way, especially if you are learning about python. You can subclass dict
and override __getitem__
to look for partial strings.
class dict_partial(dict):
def __getitem__(self, value):
for k in self.keys():
if k in value:
return self.get(k)
else:
return self.get(None)
product_map = dict_partial({'dell':'Dell Inc.', 'apple':'Apple Inc.',
'acer': 'Acer Inc.', 'mac': 'Apple Inc.',
'lenovo': 'Dell Inc.'})
df['company'] = df['comp'].apply(lambda x: product_map[x])
comp price company
# 0 dell notebook 0 Dell Inc.
# 1 dell notebook S3 1 Dell Inc.
# 2 dell notepad 2 Dell Inc.
# 3 apple ipad 3 Apple Inc.
# 4 apple ipad2 4 Apple Inc.
# 5 acer chromebook 5 Acer Inc.
# 6 acer chromebookx 6 Acer Inc.
# 7 mac air 7 Apple Inc.
# 8 mac pro 8 Apple Inc.
# 9 lenovo x4 9 Dell Inc.
My only annoyance with this method is that subclassing dict
does't override dict.get
at the same time as []
syntax. If this were possible, we could get rid of the lambda
and use df['comp'].map(product_map.get)
. There doesn't seem to be an obvious solution to this.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With