Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas sum multiple dataframes

Tags:

I have multiple dataframes each with a multi-level-index and a value column. I want to add up all the dataframes on the value columns.

df1 + df2

Not all the indexes are complete in each dataframe, hence I am getting nan on a row which is not present in all the dataframes.

How can I overcome this and treat rows which are not present in any dataframe as having a value of 0?

Eg. I want to get

   val a    2 b    4 c    3 d    3 

from pd.DataFrame({'val':{'a': 1, 'b':2, 'c':3}}) + pd.DataFrame({'val':{'a': 1, 'b':2, 'd':3}}) instead of

   val a    2 b    4 c  NaN d  NaN 
like image 293
hangc Avatar asked Jul 20 '16 04:07

hangc


People also ask

How do I sum multiple DataFrames in Python?

The sum() method adds all values in each column and returns the sum for each column. By specifying the column axis ( axis='columns' ), the sum() method searches column-wise and returns the sum of each row.

How do I get the sum of multiple columns in pandas?

Sum all columns in a Pandas DataFrame into new column If we want to summarize all the columns, then we can simply use the DataFrame sum() method.

How do I sum multiple rows in pandas?

To sum all the rows of a DataFrame, use the sum() function and set the axis value as 1. The value axis 1 will add the row values.


1 Answers

use the add method with fill_value=0 parameter.

df1 = pd.DataFrame({'val':{'a': 1, 'b':2, 'c':3}}) df2 = pd.DataFrame({'val':{'a': 1, 'b':2, 'd':3}})  df1.add(df2, fill_value=0)     val a  2.0 b  4.0 c  3.0 d  3.0 

MultiIndex example

idx1 = pd.MultiIndex.from_tuples([('a', 'A'), ('a', 'B'), ('b', 'A'), ('b', 'D')]) idx2 = pd.MultiIndex.from_tuples([('a', 'A'), ('a', 'C'), ('b', 'A'), ('b', 'C')])  np.random.seed([3,1415]) df1 = pd.DataFrame(np.random.randn(4, 1), idx1, ['val']) df2 = pd.DataFrame(np.random.randn(4, 1), idx2, ['val'])  df1            val a A -2.129724   B -1.268466 b A -1.970500   D -2.259055  df2            val a A -0.349286   C -0.026955 b A  0.316236   C  0.348782  df1.add(df2, fill_value=0)            val a A -2.479011   B -1.268466   C -0.026955 b A -1.654264   C  0.348782   D -2.259055 

More than 2 dataframes

from functools import reduce  df1 = pd.DataFrame({'val':{'a': 1, 'b':2, 'c':3}}) df2 = pd.DataFrame({'val':{'a': 1, 'b':2, 'd':3}}) df3 = pd.DataFrame({'val':{'e': 1, 'c':2, 'd':3}}) df4 = pd.DataFrame({'val':{'f': 1, 'a':2, 'd':3}}) df5 = pd.DataFrame({'val':{'g': 1, 'f':2, 'd':3}})  reduce(lambda a, b: a.add(b, fill_value=0), [df1, df2, df3, df4, df5])      val a   4.0 b   4.0 c   5.0 d  12.0 e   1.0 f   3.0 g   1.0 
like image 169
piRSquared Avatar answered Oct 24 '22 23:10

piRSquared