Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate during merge using pandas

Is there a way to perform calculations between columns while merging pandas DataFrames? For example, say I have the following two DFs, and I want to multiply the data for corresponding ids:

df1 = DataFrame({'id':['a','b','c'], 'data':[8,7,12]})

df2 = DataFrame({'id':['b','a','c'], 'data':[3,2,1]})

The desired output would be:

  id  data
0  a    16
1  b    21
2  c    12

I know it is possible to do this by merging, and multiplying like so:

merged = pd.merge(df1, df2, how='left', on=['id'])
merged['data'] = merged.data_x * merged.data_y
merged[['id', 'data']]

However, for large numbers of columns, this is quite cumbersome, especially since the merged column names have the _x and _y suffixes. What I would like is basically the pandas equivalent of the following SQL:

SELECT a.id, a.data * b.data AS 'data'
FROM table a
LEFT JOIN table b 
ON a.id = b.id
like image 497
aensm Avatar asked Oct 23 '25 15:10

aensm


1 Answers

In this case, you could simply set the index and then multiply:

>>> df1 = pd.DataFrame({'id':['a','b','c'], 'data':[8,7,12]})
>>> df2 = pd.DataFrame({'id':['b','a','c'], 'data':[3,2,1]})
>>> df1.set_index("id") * df2.set_index("id")
    data
id      
a     16
b     21
c     12

with a reset_index if you like:

>>> (df1.set_index("id") * df2.set_index("id")).reset_index()
  id  data
0  a    16
1  b    21
2  c    12
like image 72
DSM Avatar answered Oct 25 '25 04:10

DSM