Pandas: replace column values based on match from another column

Question

I've a column in first data-frame df1["ItemType"] as below,

Dataframe1

ItemType1
redTomato
whitePotato
yellowPotato
greenCauliflower
yellowCauliflower
yelloSquash
redOnions
YellowOnions
WhiteOnions
yellowCabbage
GreenCabbage

I need to replace that based on a dictionary created from another data-frame.

Dataframe2

ItemType2          newType
whitePotato        Potato
yellowPotato       Potato
redTomato          Tomato
yellowCabbage   
GreenCabbage    
yellowCauliflower   yellowCauliflower
greenCauliflower    greenCauliflower
YellowOnions        Onions
WhiteOnions         Onions
yelloSquash         Squash
redOnions           Onions

Notice that,

In dataframe2 some of the ItemType are same as ItemType in dataframe1.
Some ItemType in dataframe2 have null values like yellowCabbage.
ItemType in dataframe2 are out of order with respect toItemType in dataframe

I need to replace values in Dataframe1 ItemType column if there is a match for value in the corresponding Dataframe2 ItemType with newType keeping above exceptions listed in bullet-points in mind.
If there is no match, then values needs to be as they are [ no change].

So far I got is.

import pandas as pd

#read second `csv-file`
df2 = pd.read_csv('mappings.csv',names = ["ItemType", "newType"])
#conver to dict
df2=df2.set_index('ItemType').T.to_dict('list')

Below given replace on match are not working. They are inserting NaN values instead of actual. These are based on discussion here on SO.

df1.loc[df1['ItemType'].isin(df2['ItemType'])]=df2[['NewType']]

OR

df1['ItemType']=df2['ItemType'].map(df2)

Thanks in advance

EDIT
Two column headers in both data frames have different names. So dataframe1 column on is ItemType1 and first column in second data-frame is ItemType2. Missed that on first edit.

piRSquared · Accepted Answer

Use map

All the logic you need:

def update_type(t1, t2, dropna=False):
    return t1.map(t2).dropna() if dropna else t1.map(t2).fillna(t1)

Let's make 'ItemType2' the index of Dataframe2

update_type(Dataframe1.ItemType1,
            Dataframe2.set_index('ItemType2').newType)

0                Tomato
1                Potato
2                Potato
3      greenCauliflower
4     yellowCauliflower
5                Squash
6                Onions
7                Onions
8                Onions
9         yellowCabbage
10         GreenCabbage
Name: ItemType1, dtype: object

update_type(Dataframe1.ItemType1,
            Dataframe2.set_index('ItemType2').newType,
            dropna=True)

0                Tomato
1                Potato
2                Potato
3      greenCauliflower
4     yellowCauliflower
5                Squash
6                Onions
7                Onions
8                Onions
Name: ItemType1, dtype: object

Verify

updated = update_type(Dataframe1.ItemType1, Dataframe2.set_index('ItemType2').newType)

pd.concat([Dataframe1, updated], axis=1, keys=['old', 'new'])

enter image description here

Timing

def root(Dataframe1, Dataframe2):
    return Dataframe1['ItemType1'].replace(Dataframe2.set_index('ItemType2')['newType'].dropna())

def piRSquared(Dataframe1, Dataframe2):
    t1 = Dataframe1.ItemType1
    t2 = Dataframe2.set_index('ItemType2').newType
    return update_type(t1, t2)

enter image description here

draco_alpine · Answer

This method requires you set your column names to 'type', then you can set off using merge and np.where

df3 = df1.merge(df2,how='inner',on='type')['type','newType']

df3['newType'] = np.where(df['newType'].isnull(),df['type'],df['newType'])

Pandas: replace column values based on match from another column

Tags:

python

pandas

dataframe

python-2.7

Anil_M

2 Answers

Verify

Timing

piRSquared

draco_alpine

Recent Activity

Donate For Us

Pandas: replace column values based on match from another column

Tags:

python

pandas

dataframe

python-2.7

Anil_M

2 Answers

Verify

Timing

piRSquared

draco_alpine

Related questions

Recent Activity

Donate For Us