I have two dataframes like these:
df1 = pd.DataFrame({'A': [1,0,3], 'B':[0,0,1], 'C':[0,2,2]}, index =['a','b','c'])
df2 = pd.DataFrame({'A': [0,0], 'B':[2,1]}, index =['a','c'])
df1 and df2:
| A | B | C | | A | B |
---|---|---|---| ---|---|---|
a | 1 | 0 | 0 | a | 0 | 2 |
b | 0 | 0 | 2 | c | 0 | 1 |
c | 3 | 1 | 2 |
And the expected result is:
| A | B | C |
---|---|---|---|
a | 1 | 2 | 0 |
b | 0 | 0 | 2 |
c | 3 | 2 | 2 |
I'm having problems with this because there may be missing columns/rows in any of the dataframes (df1 may not have all columns and rows df2 has)
To merge two Pandas DataFrame with common column, use the merge() function and set the ON parameter as the column name.
How do I join two DataFrames based on two columns? The concat() function can be used to concatenate two Dataframes by adding the rows of one to the other. The merge() function is equivalent to the SQL JOIN clause. 'left', 'right' and 'inner' joins are all possible.
Pandas' merge and concat can be used to combine subsets of a DataFrame, or even data from different files. join function combines DataFrames based on index or column. Joining two DataFrames can be done in multiple ways (left, right, and inner) depending on what data must be in the final DataFrame.
Going by the idea in the answer for this question - merge 2 dataframes in Pandas: join on some columns, sum up others
Since in your case, the indexes are the ones that are common, you can use pandas.concat()
for the two DataFrames, then DataFrame.groupby
based on the index, and then take sum on it. Example -
In [27]: df1
Out[27]:
A B C
a 1 0 0
b 0 0 2
c 3 1 2
In [28]: df2
Out[28]:
A B
a 0 2
c 0 1
In [29]: pd.concat([df1,df2]).groupby(level=0).sum()
Out[29]:
A B C
a 1 2 0
b 0 0 2
c 3 2 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With