This seems like it should be a common use case but I'm not finding any good guidance on this. I have a solution that works but I would rather have a vectorized lookup rather than using the Pandas apply()
function.
Here is an example of what I am doing:
import pandas as pd
example_dict = {
"category1":{
"field1": 0.0,
"filed2": 5.0},
"category2":{
"field1": 5.0,
"field2": 8.0}}
d = {"ids": range(10),
"category": ["category1" if x % 2 == 0 else "category2" for x in range(10)]}
df = pd.DataFrame(d)
# The operation I am trying to vectorize
df['category_data'] = df.apply(lambda row: example_dict[row['category']], axis=1)
On the last line you can see where I am using the apply()
function to perform the dictionary lookup. My gut tells me there should be a way to vectorize this. I may be wrong, but I would like to know that as well. I often run into scenarios where I need to lookup information in a dictionary and add it as a column the a DataFrame
.
By using map
df['map']=df.category.map(example_dict)
df
Out[839]:
category ids category_data \
0 category1 0 {'field1': 0.0, 'filed2': 5.0}
1 category2 1 {'field1': 5.0, 'field2': 8.0}
2 category1 2 {'field1': 0.0, 'filed2': 5.0}
3 category2 3 {'field1': 5.0, 'field2': 8.0}
4 category1 4 {'field1': 0.0, 'filed2': 5.0}
5 category2 5 {'field1': 5.0, 'field2': 8.0}
6 category1 6 {'field1': 0.0, 'filed2': 5.0}
7 category2 7 {'field1': 5.0, 'field2': 8.0}
8 category1 8 {'field1': 0.0, 'filed2': 5.0}
9 category2 9 {'field1': 5.0, 'field2': 8.0}
map
0 {'field1': 0.0, 'filed2': 5.0}
1 {'field1': 5.0, 'field2': 8.0}
2 {'field1': 0.0, 'filed2': 5.0}
3 {'field1': 5.0, 'field2': 8.0}
4 {'field1': 0.0, 'filed2': 5.0}
5 {'field1': 5.0, 'field2': 8.0}
6 {'field1': 0.0, 'filed2': 5.0}
7 {'field1': 5.0, 'field2': 8.0}
8 {'field1': 0.0, 'filed2': 5.0}
9 {'field1': 5.0, 'field2': 8.0}
If you need them into different columns
pd.DataFrame(df['map'].tolist())
Out[843]:
field1 field2 filed2
0 0.0 NaN 5.0
1 5.0 8.0 NaN
2 0.0 NaN 5.0
3 5.0 8.0 NaN
4 0.0 NaN 5.0
5 5.0 8.0 NaN
6 0.0 NaN 5.0
7 5.0 8.0 NaN
8 0.0 NaN 5.0
9 5.0 8.0 NaN
Or
df['map'].apply(pd.Series)
Out[844]:
field1 field2 filed2
0 0.0 NaN 5.0
1 5.0 8.0 NaN
2 0.0 NaN 5.0
3 5.0 8.0 NaN
4 0.0 NaN 5.0
5 5.0 8.0 NaN
6 0.0 NaN 5.0
7 5.0 8.0 NaN
8 0.0 NaN 5.0
9 5.0 8.0 NaN
You could create a second DataFrame from example_dict
and then merge
the two Dataframes
d2 = pd.DataFrame(example_dict.keys(),columns=
['category']).assign(category_data=example_dict.values())
df.merge(d2,on='category',how='left')
category ids category_data
0 category1 0 {u'filed2': 5.0, u'field1': 0.0}
1 category2 1 {u'field2': 8.0, u'field1': 5.0}
2 category1 2 {u'filed2': 5.0, u'field1': 0.0}
3 category2 3 {u'field2': 8.0, u'field1': 5.0}
4 category1 4 {u'filed2': 5.0, u'field1': 0.0}
5 category2 5 {u'field2': 8.0, u'field1': 5.0}
6 category1 6 {u'filed2': 5.0, u'field1': 0.0}
7 category2 7 {u'field2': 8.0, u'field1': 5.0}
8 category1 8 {u'filed2': 5.0, u'field1': 0.0}
9 category2 9 {u'field2': 8.0, u'field1': 5.0}
Separating Dictionary values to columns
d2 = pd.DataFrame(example_dict).T
df.merge(d2,how='left',left_on='category',right_index=True)
category ids field1 field2 filed2
0 category1 0 0.0 NaN 5.0
1 category2 1 5.0 8.0 NaN
2 category1 2 0.0 NaN 5.0
3 category2 3 5.0 8.0 NaN
4 category1 4 0.0 NaN 5.0
5 category2 5 5.0 8.0 NaN
6 category1 6 0.0 NaN 5.0
7 category2 7 5.0 8.0 NaN
8 category1 8 0.0 NaN 5.0
9 category2 9 5.0 8.0 NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With