I have multiple dataframes each with a multi-level-index and a value column. I want to add up all the dataframes on the value columns.
df1 + df2
Not all the indexes are complete in each dataframe, hence I am getting nan
on a row which is not present in all the dataframes.
How can I overcome this and treat rows which are not present in any dataframe as having a value of 0?
Eg. I want to get
val a 2 b 4 c 3 d 3
from pd.DataFrame({'val':{'a': 1, 'b':2, 'c':3}}) + pd.DataFrame({'val':{'a': 1, 'b':2, 'd':3}})
instead of
val a 2 b 4 c NaN d NaN
The sum() method adds all values in each column and returns the sum for each column. By specifying the column axis ( axis='columns' ), the sum() method searches column-wise and returns the sum of each row.
Sum all columns in a Pandas DataFrame into new column If we want to summarize all the columns, then we can simply use the DataFrame sum() method.
To sum all the rows of a DataFrame, use the sum() function and set the axis value as 1. The value axis 1 will add the row values.
use the add
method with fill_value=0
parameter.
df1 = pd.DataFrame({'val':{'a': 1, 'b':2, 'c':3}}) df2 = pd.DataFrame({'val':{'a': 1, 'b':2, 'd':3}}) df1.add(df2, fill_value=0) val a 2.0 b 4.0 c 3.0 d 3.0
idx1 = pd.MultiIndex.from_tuples([('a', 'A'), ('a', 'B'), ('b', 'A'), ('b', 'D')]) idx2 = pd.MultiIndex.from_tuples([('a', 'A'), ('a', 'C'), ('b', 'A'), ('b', 'C')]) np.random.seed([3,1415]) df1 = pd.DataFrame(np.random.randn(4, 1), idx1, ['val']) df2 = pd.DataFrame(np.random.randn(4, 1), idx2, ['val']) df1 val a A -2.129724 B -1.268466 b A -1.970500 D -2.259055 df2 val a A -0.349286 C -0.026955 b A 0.316236 C 0.348782 df1.add(df2, fill_value=0) val a A -2.479011 B -1.268466 C -0.026955 b A -1.654264 C 0.348782 D -2.259055
from functools import reduce df1 = pd.DataFrame({'val':{'a': 1, 'b':2, 'c':3}}) df2 = pd.DataFrame({'val':{'a': 1, 'b':2, 'd':3}}) df3 = pd.DataFrame({'val':{'e': 1, 'c':2, 'd':3}}) df4 = pd.DataFrame({'val':{'f': 1, 'a':2, 'd':3}}) df5 = pd.DataFrame({'val':{'g': 1, 'f':2, 'd':3}}) reduce(lambda a, b: a.add(b, fill_value=0), [df1, df2, df3, df4, df5]) val a 4.0 b 4.0 c 5.0 d 12.0 e 1.0 f 3.0 g 1.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With