A B DATE 2013-05-01 473077 71333 2013-05-02 35131 62441 2013-05-03 727 27381 2013-05-04 481 1206 2013-05-05 226 1733 2013-05-06 NaN 4064 2013-05-07 NaN 41151 2013-05-08 NaN 8144 2013-05-09 NaN 23 2013-05-10 NaN 10
say i have the dataframe above. what is the easiest way to get a series with the same index which is the average of the columns A and B? the average needs to ignore NaN values. the twist is that this solution needs to be flexible to the addition of new columns to the dataframe.
the closest i have come was
df.sum(axis=1) / len(df.columns)
however, this does not seem to ignore the NaN values
(note: i am still a bit new to the pandas library, so i'm guessing there's an obvious way to do this that my limited brain is simply not seeing)
Algorithm. Step 1: Define a Pandas series. Step 2: Use the mean() function to calculate the mean. Step 3: Print the mean.
To find mean of DataFrame, use Pandas DataFrame. mean() function. The DataFrame. mean() function returns the mean of the values for the requested axis.
Pandas Mean will return the average of your data across a specified axis. If the function is applied to a DataFrame, pandas will return a series with the mean across an axis. If . mean() is applied to a Series, then pandas will return a scalar (single number).
Simply using df.mean()
will Do The Right Thing(tm) with respect to NaNs:
>>> df A B DATE 2013-05-01 473077 71333 2013-05-02 35131 62441 2013-05-03 727 27381 2013-05-04 481 1206 2013-05-05 226 1733 2013-05-06 NaN 4064 2013-05-07 NaN 41151 2013-05-08 NaN 8144 2013-05-09 NaN 23 2013-05-10 NaN 10 >>> df.mean(axis=1) DATE 2013-05-01 272205.0 2013-05-02 48786.0 2013-05-03 14054.0 2013-05-04 843.5 2013-05-05 979.5 2013-05-06 4064.0 2013-05-07 41151.0 2013-05-08 8144.0 2013-05-09 23.0 2013-05-10 10.0 dtype: float64
You can use df[["A", "B"]].mean(axis=1)
if there are other columns to ignore.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With