Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to map one dataframe to another (python pandas)?

Given these two dataframes, how do I get the intended output dataframe? The long way would be to loop through the rows of the dataframe with iloc and then use the map function after converting df2 to a dict to map the x and y to their score.

This seems tedious and would take long to run on a large dataframe. I'm hoping there's a cleaner solution.

df1:

ID    A    B    C
1     x    x    y
2     y    x    y
3     x    y    y

df2:

ID    score_x    score_y
1          20         30
2          15         17
3          18         22

output:

ID    A     B     C
1     20    20    30
2     17    15    17
3     18    22    22

Note: the dataframes would have many columns and there would be more than just x and y as categories (possibly in the region of 20 categories).

Thanks!

like image 926
alwayscurious Avatar asked Jul 10 '19 11:07

alwayscurious


People also ask

How do I assign a DataFrame to another DataFrame in Python?

Pandas DataFrame copy() Method The copy() method returns a copy of the DataFrame. By default, the copy is a "deep copy" meaning that any changes made in the original DataFrame will NOT be reflected in the copy.

How do I connect two pandas DataFrame?

The concat() function can be used to concatenate two Dataframes by adding the rows of one to the other. The merge() function is equivalent to the SQL JOIN clause. 'left', 'right' and 'inner' joins are all possible.


2 Answers

Use DataFrame.apply along columns with Series.map:

df1.set_index('ID', inplace=True)
df2.set_index('ID', inplace=True)
df2.columns = df2.columns.str.split('_').str[-1]

df1 = df1.apply(lambda x: x.map(df2.loc[x.name]), axis=1).reset_index()

print(df1)
   ID   A   B   C
0   1  20  20  30
1   2  17  15  17
2   3  18  22  22

print(df2)
     x   y
ID        
1   20  30
2   15  17
3   18  22
like image 177
Space Impact Avatar answered Oct 12 '22 10:10

Space Impact


Using mask:

df1.set_index('ID', inplace=True)
df2.set_index('ID', inplace=True)

df1.mask(df1=='x',df2['score_x'],axis=0).mask(df1=='y',df2['score_y'],axis=0)

Result:

     A   B   C
ID            
1   20  20  30
2   17  15  17
3   18  22  22

If there are many columns and they are all named in the same way, you can use something like that:

for e in df2.columns.str.split('_').str[-1]:
     df1.mask(df1==e, df2['score_'+e], axis=0, inplace=True)
like image 45
Stef Avatar answered Oct 12 '22 10:10

Stef