Given these two dataframes, how do I get the intended output dataframe?
The long way would be to loop through the rows of the dataframe with iloc
and then use the map
function after converting df2
to a dict
to map the x and y to their score.
This seems tedious and would take long to run on a large dataframe. I'm hoping there's a cleaner solution.
df1:
ID A B C
1 x x y
2 y x y
3 x y y
df2:
ID score_x score_y
1 20 30
2 15 17
3 18 22
output:
ID A B C
1 20 20 30
2 17 15 17
3 18 22 22
Note: the dataframes would have many columns and there would be more than just x and y as categories (possibly in the region of 20 categories).
Thanks!
Pandas DataFrame copy() Method The copy() method returns a copy of the DataFrame. By default, the copy is a "deep copy" meaning that any changes made in the original DataFrame will NOT be reflected in the copy.
The concat() function can be used to concatenate two Dataframes by adding the rows of one to the other. The merge() function is equivalent to the SQL JOIN clause. 'left', 'right' and 'inner' joins are all possible.
Use DataFrame.apply
along columns with Series.map
:
df1.set_index('ID', inplace=True)
df2.set_index('ID', inplace=True)
df2.columns = df2.columns.str.split('_').str[-1]
df1 = df1.apply(lambda x: x.map(df2.loc[x.name]), axis=1).reset_index()
print(df1)
ID A B C
0 1 20 20 30
1 2 17 15 17
2 3 18 22 22
print(df2)
x y
ID
1 20 30
2 15 17
3 18 22
Using mask:
df1.set_index('ID', inplace=True)
df2.set_index('ID', inplace=True)
df1.mask(df1=='x',df2['score_x'],axis=0).mask(df1=='y',df2['score_y'],axis=0)
Result:
A B C
ID
1 20 20 30
2 17 15 17
3 18 22 22
If there are many columns and they are all named in the same way, you can use something like that:
for e in df2.columns.str.split('_').str[-1]:
df1.mask(df1==e, df2['score_'+e], axis=0, inplace=True)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With