Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: replace column values based on match from another column

I've a column in first data-frame df1["ItemType"] as below,

Dataframe1

ItemType1
redTomato
whitePotato
yellowPotato
greenCauliflower
yellowCauliflower
yelloSquash
redOnions
YellowOnions
WhiteOnions
yellowCabbage
GreenCabbage

I need to replace that based on a dictionary created from another data-frame.

Dataframe2

ItemType2          newType
whitePotato        Potato
yellowPotato       Potato
redTomato          Tomato
yellowCabbage   
GreenCabbage    
yellowCauliflower   yellowCauliflower
greenCauliflower    greenCauliflower
YellowOnions        Onions
WhiteOnions         Onions
yelloSquash         Squash
redOnions           Onions

Notice that,

  • In dataframe2 some of the ItemType are same as ItemType in dataframe1.
  • Some ItemType in dataframe2 have null values like yellowCabbage.
  • ItemType in dataframe2 are out of order with respect toItemType in dataframe

I need to replace values in Dataframe1 ItemType column if there is a match for value in the corresponding Dataframe2 ItemType with newType keeping above exceptions listed in bullet-points in mind.
If there is no match, then values needs to be as they are [ no change].

So far I got is.

import pandas as pd

#read second `csv-file`
df2 = pd.read_csv('mappings.csv',names = ["ItemType", "newType"])
#conver to dict
df2=df2.set_index('ItemType').T.to_dict('list')

Below given replace on match are not working. They are inserting NaN values instead of actual. These are based on discussion here on SO.

df1.loc[df1['ItemType'].isin(df2['ItemType'])]=df2[['NewType']]

OR

df1['ItemType']=df2['ItemType'].map(df2)

Thanks in advance

EDIT
Two column headers in both data frames have different names. So dataframe1 column on is ItemType1 and first column in second data-frame is ItemType2. Missed that on first edit.

like image 541
Anil_M Avatar asked Jul 19 '16 19:07

Anil_M


2 Answers

Use map

All the logic you need:

def update_type(t1, t2, dropna=False):
    return t1.map(t2).dropna() if dropna else t1.map(t2).fillna(t1)

Let's make 'ItemType2' the index of Dataframe2

update_type(Dataframe1.ItemType1,
            Dataframe2.set_index('ItemType2').newType)

0                Tomato
1                Potato
2                Potato
3      greenCauliflower
4     yellowCauliflower
5                Squash
6                Onions
7                Onions
8                Onions
9         yellowCabbage
10         GreenCabbage
Name: ItemType1, dtype: object

update_type(Dataframe1.ItemType1,
            Dataframe2.set_index('ItemType2').newType,
            dropna=True)

0                Tomato
1                Potato
2                Potato
3      greenCauliflower
4     yellowCauliflower
5                Squash
6                Onions
7                Onions
8                Onions
Name: ItemType1, dtype: object

Verify

updated = update_type(Dataframe1.ItemType1, Dataframe2.set_index('ItemType2').newType)

pd.concat([Dataframe1, updated], axis=1, keys=['old', 'new'])

enter image description here


Timing

def root(Dataframe1, Dataframe2):
    return Dataframe1['ItemType1'].replace(Dataframe2.set_index('ItemType2')['newType'].dropna())

def piRSquared(Dataframe1, Dataframe2):
    t1 = Dataframe1.ItemType1
    t2 = Dataframe2.set_index('ItemType2').newType
    return update_type(t1, t2)

enter image description here

like image 150
piRSquared Avatar answered Sep 19 '22 13:09

piRSquared


This method requires you set your column names to 'type', then you can set off using merge and np.where

df3 = df1.merge(df2,how='inner',on='type')['type','newType']

df3['newType'] = np.where(df['newType'].isnull(),df['type'],df['newType'])
like image 22
draco_alpine Avatar answered Sep 20 '22 13:09

draco_alpine