I want to merge two dataframes on specific columns (key1, key2) and sum up the values for another column (value).
>>> df1 = pd.DataFrame({'key1': range(4), 'key2': range(4), 'value': range(4)})
key1 key2 value
0 0 0 0
1 1 1 1
2 2 2 2
3 3 3 3
>>> df2 = pd.DataFrame({'key1': range(2, 6), 'key2': range(2, 6), 'noise': range(2, 6), 'value': range(10, 14)})
key1 key2 noise value
0 2 2 2 10
1 3 3 3 11
2 4 4 4 12
3 5 5 5 13
I want this result:
key1 key2 value
0 0 0 0
1 1 1 1
2 2 2 12
3 3 3 14
4 4 4 12
5 5 5 13
In SQL terms, I want:
SELECT df1.key1, df1.key2, df1.value + df2.value AS value
FROM df1 OUTER JOIN df2 ON key1, key2
I tried two approaches:
approach 1
concatenated = pd.concat([df1, df2])
grouped = concatenated.groupby(['key1', 'key2'], as_index=False)
summed = grouped.agg(np.sum)
result = summed[['key1', 'key2', 'value']]
approach 2
joined = pd.merge(df1, df2, how='outer', on=['key1', 'key2'], suffixes=['_1', '_2'])
joined = joined.fillna(0.0)
joined['value'] = joined['value_1'] + joined['value_2']
result = joined[['key1', 'key2', 'value']]
Both approaches give the result I want, but I wonder if there is a simpler way.
It is possible to join the different columns is using concat() method.
We can join columns from two Dataframes using the merge() function. This is similar to the SQL 'join' functionality. A detailed discussion of different join types is given in the SQL lesson. You specify the type of join you want using the how parameter.
The concat() function in pandas is used to append either columns or rows from one DataFrame to another. The concat() function does all the heavy lifting of performing concatenation operations along an axis while performing optional set logic (union or intersection) of the indexes (if any) on the other axes.
I don't know about simpler, but you can get a little more concise:
>>> pd.concat([df1, df2]).groupby(["key1", "key2"], as_index=False)["value"].sum()
key1 key2 value
0 0 0 0
1 1 1 1
2 2 2 12
3 3 3 14
4 4 4 12
5 5 5 13
Depending on your tolerance for chaining ops, you might want to break this onto multiple lines anyway, though (four tends to be close to my upper limit, in this case concat-groupby-select-sum).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With