Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to get the average of dataframe column values

                    A        B DATE                  2013-05-01        473077    71333 2013-05-02         35131    62441 2013-05-03           727    27381 2013-05-04           481     1206 2013-05-05           226     1733 2013-05-06           NaN     4064 2013-05-07           NaN    41151 2013-05-08           NaN     8144 2013-05-09           NaN       23 2013-05-10           NaN       10 

say i have the dataframe above. what is the easiest way to get a series with the same index which is the average of the columns A and B? the average needs to ignore NaN values. the twist is that this solution needs to be flexible to the addition of new columns to the dataframe.

the closest i have come was

df.sum(axis=1) / len(df.columns) 

however, this does not seem to ignore the NaN values

(note: i am still a bit new to the pandas library, so i'm guessing there's an obvious way to do this that my limited brain is simply not seeing)

like image 961
badideas Avatar asked May 22 '13 10:05

badideas


People also ask

How do you calculate average in pandas?

Algorithm. Step 1: Define a Pandas series. Step 2: Use the mean() function to calculate the mean. Step 3: Print the mean.

How do you find the mean of a DataFrame in Python?

To find mean of DataFrame, use Pandas DataFrame. mean() function. The DataFrame. mean() function returns the mean of the values for the requested axis.

What is average in pandas?

Pandas Mean will return the average of your data across a specified axis. If the function is applied to a DataFrame, pandas will return a series with the mean across an axis. If . mean() is applied to a Series, then pandas will return a scalar (single number).


1 Answers

Simply using df.mean() will Do The Right Thing(tm) with respect to NaNs:

>>> df                  A      B DATE                      2013-05-01  473077  71333 2013-05-02   35131  62441 2013-05-03     727  27381 2013-05-04     481   1206 2013-05-05     226   1733 2013-05-06     NaN   4064 2013-05-07     NaN  41151 2013-05-08     NaN   8144 2013-05-09     NaN     23 2013-05-10     NaN     10 >>> df.mean(axis=1) DATE 2013-05-01    272205.0 2013-05-02     48786.0 2013-05-03     14054.0 2013-05-04       843.5 2013-05-05       979.5 2013-05-06      4064.0 2013-05-07     41151.0 2013-05-08      8144.0 2013-05-09        23.0 2013-05-10        10.0 dtype: float64 

You can use df[["A", "B"]].mean(axis=1) if there are other columns to ignore.

like image 178
DSM Avatar answered Sep 22 '22 08:09

DSM