Is there a way to perform calculations between columns while merging pandas DataFrames? For example, say I have the following two DFs, and I want to multiply the data for corresponding ids:
df1 = DataFrame({'id':['a','b','c'], 'data':[8,7,12]})
df2 = DataFrame({'id':['b','a','c'], 'data':[3,2,1]})
The desired output would be:
id data
0 a 16
1 b 21
2 c 12
I know it is possible to do this by merging, and multiplying like so:
merged = pd.merge(df1, df2, how='left', on=['id'])
merged['data'] = merged.data_x * merged.data_y
merged[['id', 'data']]
However, for large numbers of columns, this is quite cumbersome, especially since the merged column names have the _x and _y suffixes. What I would like is basically the pandas equivalent of the following SQL:
SELECT a.id, a.data * b.data AS 'data'
FROM table a
LEFT JOIN table b
ON a.id = b.id
In this case, you could simply set the index and then multiply:
>>> df1 = pd.DataFrame({'id':['a','b','c'], 'data':[8,7,12]})
>>> df2 = pd.DataFrame({'id':['b','a','c'], 'data':[3,2,1]})
>>> df1.set_index("id") * df2.set_index("id")
data
id
a 16
b 21
c 12
with a reset_index if you like:
>>> (df1.set_index("id") * df2.set_index("id")).reset_index()
id data
0 a 16
1 b 21
2 c 12
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With