I've a column in first data-frame df1["ItemType"]
as below,
Dataframe1
ItemType1
redTomato
whitePotato
yellowPotato
greenCauliflower
yellowCauliflower
yelloSquash
redOnions
YellowOnions
WhiteOnions
yellowCabbage
GreenCabbage
I need to replace that based on a dictionary created from another data-frame.
Dataframe2
ItemType2 newType
whitePotato Potato
yellowPotato Potato
redTomato Tomato
yellowCabbage
GreenCabbage
yellowCauliflower yellowCauliflower
greenCauliflower greenCauliflower
YellowOnions Onions
WhiteOnions Onions
yelloSquash Squash
redOnions Onions
Notice that,
dataframe2
some of the ItemType
are same as ItemType
in
dataframe1
. ItemType
in dataframe2 have null
values like yellowCabbage.ItemType
in dataframe2 are out of order with respect toItemType
in dataframe
I need to replace values in Dataframe1
ItemType
column if there is a match for value in the corresponding Dataframe2
ItemType
with newType
keeping above exceptions listed in bullet-points in mind.
If there is no match, then values needs to be as they are [ no change].
So far I got is.
import pandas as pd
#read second `csv-file`
df2 = pd.read_csv('mappings.csv',names = ["ItemType", "newType"])
#conver to dict
df2=df2.set_index('ItemType').T.to_dict('list')
Below given replace on match are not working. They are inserting NaN
values instead of actual. These are based on discussion here on SO.
df1.loc[df1['ItemType'].isin(df2['ItemType'])]=df2[['NewType']]
OR
df1['ItemType']=df2['ItemType'].map(df2)
Thanks in advance
EDIT
Two column headers in both data frames have different names. So dataframe1 column on is ItemType1 and first column in second data-frame is ItemType2. Missed that on first edit.
Use map
All the logic you need:
def update_type(t1, t2, dropna=False):
return t1.map(t2).dropna() if dropna else t1.map(t2).fillna(t1)
Let's make 'ItemType2'
the index of Dataframe2
update_type(Dataframe1.ItemType1,
Dataframe2.set_index('ItemType2').newType)
0 Tomato
1 Potato
2 Potato
3 greenCauliflower
4 yellowCauliflower
5 Squash
6 Onions
7 Onions
8 Onions
9 yellowCabbage
10 GreenCabbage
Name: ItemType1, dtype: object
update_type(Dataframe1.ItemType1,
Dataframe2.set_index('ItemType2').newType,
dropna=True)
0 Tomato
1 Potato
2 Potato
3 greenCauliflower
4 yellowCauliflower
5 Squash
6 Onions
7 Onions
8 Onions
Name: ItemType1, dtype: object
updated = update_type(Dataframe1.ItemType1, Dataframe2.set_index('ItemType2').newType)
pd.concat([Dataframe1, updated], axis=1, keys=['old', 'new'])
def root(Dataframe1, Dataframe2):
return Dataframe1['ItemType1'].replace(Dataframe2.set_index('ItemType2')['newType'].dropna())
def piRSquared(Dataframe1, Dataframe2):
t1 = Dataframe1.ItemType1
t2 = Dataframe2.set_index('ItemType2').newType
return update_type(t1, t2)
This method requires you set your column names to 'type', then you can set off using merge and np.where
df3 = df1.merge(df2,how='inner',on='type')['type','newType']
df3['newType'] = np.where(df['newType'].isnull(),df['type'],df['newType'])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With