Given these two dataframes, how do I get the intended output dataframe? The long way would be to loop through the rows of the dataframe with <code>iloc</code> and then use the <code>map</code> function after converting <code>df2</code> to a <code>dict</code> to map the x and y to their score. This seems tedious and would take long to run on a large dataframe. I'm hoping there's a cleaner solution. df1: <pre class="prettyprint"><code>ID A B C 1 x x y 2 y x y 3 x y y </code></pre> df2: <pre class="prettyprint"><code>ID score_x score_y 1 20 30 2 15 17 3 18 22 </code></pre> output: <pre class="prettyprint"><code>ID A B C 1 20 20 30 2 17 15 17 3 18 22 22 </code></pre> Note: the dataframes would have many columns and there would be more than just x and y as categories (possibly in the region of 20 categories). Thanks!

Using mask: <pre class="prettyprint"><code>df1.set_index('ID', inplace=True) df2.set_index('ID', inplace=True) df1.mask(df1=='x',df2['score_x'],axis=0).mask(df1=='y',df2['score_y'],axis=0) </code></pre> Result: <pre class="prettyprint"><code> A B C ID 1 20 20 30 2 17 15 17 3 18 22 22 </code></pre> If there are many columns and they are all named in the same way, you can use something like that: <pre class="prettyprint"><code>for e in df2.columns.str.split('_').str[-1]: df1.mask(df1==e, df2['score_'+e], axis=0, inplace=True) </code></pre>

How to map one dataframe to another (python pandas)?

Tags:

python

pandas

dataframe

Given these two dataframes, how do I get the intended output dataframe? The long way would be to loop through the rows of the dataframe with iloc and then use the map function after converting df2 to a dict to map the x and y to their score.

This seems tedious and would take long to run on a large dataframe. I'm hoping there's a cleaner solution.

df1:

ID    A    B    C
1     x    x    y
2     y    x    y
3     x    y    y

df2:

ID    score_x    score_y
1          20         30
2          15         17
3          18         22

output:

ID    A     B     C
1     20    20    30
2     17    15    17
3     18    22    22

Note: the dataframes would have many columns and there would be more than just x and y as categories (possibly in the region of 20 categories).

Thanks!

926

asked Jul 10 '19 11:07

alwayscurious

2 Answers

Use DataFrame.apply along columns with Series.map:

df1.set_index('ID', inplace=True)
df2.set_index('ID', inplace=True)
df2.columns = df2.columns.str.split('_').str[-1]

df1 = df1.apply(lambda x: x.map(df2.loc[x.name]), axis=1).reset_index()

print(df1)
   ID   A   B   C
0   1  20  20  30
1   2  17  15  17
2   3  18  22  22

177

answered Oct 12 '22 10:10

Space Impact

Using mask:

df1.set_index('ID', inplace=True)
df2.set_index('ID', inplace=True)

df1.mask(df1=='x',df2['score_x'],axis=0).mask(df1=='y',df2['score_y'],axis=0)

Result:

     A   B   C
ID            
1   20  20  30
2   17  15  17
3   18  22  22

If there are many columns and they are all named in the same way, you can use something like that:

for e in df2.columns.str.split('_').str[-1]:
     df1.mask(df1==e, df2['score_'+e], axis=0, inplace=True)

answered Oct 12 '22 10:10

Stef

Related questions
                            
                                strings in sorted order, except group all the strings that begin with 'x' first
                            
                                Select rows from DataFrame where ID count is greater than X
                            
                                Python basemap in google colaboratory
                            
                                Implement dropout to fully connected layer in PyTorch
                            
                                Pandas Plot: scatter plot with index [duplicate]
                            
                                How to generate legible plots in pandas when looping over columns?
                            
                                check element-wise for existence of string
                            
                                Why is .loc slicing in pandas inclusive of stop, contrary to typical python slicing?
                            
                                Python efficient way of writing switch case with comparison
                            
                                How can i solve backward() got an unexpected keyword argument 'retain_variables'?
                            
                                Converting cftime.DatetimeJulian to datetime
                            
                                Can't reach Locust WebInterface "ERR_CONNECTION_REFUSED"
                            
                                Add arbitrary lines on seaborn jointplot
                            
                                Should the Conda (base) environment be kept up to date?
                            
                                How to use a pretrained model from s3 to predict some data?
                            
                                duplicate key value violates unique constraint in django
                            
                                Nested tf.function is horribly slow
                            
                                Forward Fill Pandas Dataframe Horizontally (along rows) without forward filling last value in each row
                            
                                Pandas / xlsxwriter writer.close() does not completely close the excel file
                            
                                Finding all possible combinations whose sum is within certain range of target

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With