Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas - Replacing Values by Looking Up in an Another Dataframe

I have a problem to solve in my pandas dataframe with Python3. I have two dataframes - the first one is as;

    ID Name  Linked Model 1  Linked Model 2  Linked Model 3
0  100    A          1111.0          1112.0             NaN
1  101    B          1112.0          1113.0          1115.0
2  102    C             NaN             NaN             NaN
3  103    D          1114.0             NaN             NaN
4  104    E          1114.0          1111.0          1112.0

the second one is;

   Model ID Name
0      1111    A
1      1112  A,B
2      1113    C
3      1114    D
4      1115    Q
5      1116    Z
6      1117    E
7      1118    W

So the code should look up the value in - for instance in Linked Model 1 column and find the corresponding value in Name column in the second dataframe so that the ID can be replaced with name just like as shown in the result;

enter image description here

So as you can see in the result output, None stays as None (could be replaced numpy N/As) and the names from the second dataframe are now replaced with their corresponding Model IDs in the first dataframe.

I am looking forward to hearing your solutions!

Thanks

like image 969
iSerd Avatar asked Dec 17 '18 15:12

iSerd


2 Answers

Initialise a replacement dictionary and use df.replace to map those IDs to Names.

m = df2.set_index('Model ID')['Name'].to_dict()
v = df.filter(like='Linked Model')
df[v.columns] = v.replace(m)

df
    ID Name Linked Model 1 Linked Model 2 Linked Model 3
0  100    A              A            A,B            NaN
1  101    B            A,B              C              Q
2  102    C            NaN            NaN            NaN
3  103    D              D            NaN            NaN
4  104    E              D              A            A,B
like image 150
cs95 Avatar answered Sep 19 '22 03:09

cs95


First attempt to answer a python question, so while this is certainly longer than coldspeed's answer, it makes more sense to me using the melt, merge, and pivot funcitons.

import pandas as pd
import numpy as np

# make an object from the first dataset

df_1 = pd.DataFrame(
  {"ID" : [100, 101, 102, 103, 104],
  "Name" : ["A", "B", "C", "D", "E"],
  "Linked Model 1" : [1111, 1112, np.nan, 1114, 1114],
  "Linked Model 2" : [1112, 1113, np.nan, np.nan, 1111],
  "Linked Model 3" : [np.nan, 1115, np.nan, np.nan, 1112]})

# make an object for the second data set

df_2 = pd.DataFrame(
  {"Model ID" : [1111, 1112, 1113, 1114, 1115, 1116, 1117, 1118],
  "Name" : ["A", "A,B", "C", "D", "Q", "Z", "E", "W"]})

# tidy the data
df_1 = pd.melt(df_1, ["ID", "Name"]) 

# left join the second data set
df_1 = pd.merge(df_1, df_2, how='left', left_on='value', right_on='Model ID').reset_index()

#pivot the data back out to achieve the desired format
df_1 = df_1.pivot_table(index='ID', 
                        columns='variable', 
                        values='Name_y', 
                        aggfunc='first', 
                        dropna=False))

variable Linked Model 1 Linked Model 2 Linked Model 3
ID                                                   
100                   A            A,B            NaN
101                 A,B              C              Q
102                 NaN            NaN            NaN
103                   D            NaN            NaN
104                   D              A            A,B
like image 26
Ben G Avatar answered Sep 23 '22 03:09

Ben G