Standard error ignoring NaN in pandas groupby groups

Question

I have data loaded into a dataframe with that has a multi index for the columns headers. Currently I've been grouping the data by the columns indices to take the mean of the groups and calculate the 95% confidence intervals like this:

from pandas import *
import pandas as pd
from scipy import stats as st

#Normalize to starting point then convert
normalized = (data - data.ix[0]) * 11.11111
#Group normalized data based on slope and orientation
grouped = normalized.groupby(level=['SLOPE','DEPTH'], axis=1)
#Obtain mean of each group
means = grouped.mean()
#Calculate 95% confidence interval for each group
ci = grouped.aggregate(lambda x: st.sem(x) * 1.96)

but the problem with this is that the mean function that is used on the groups ignores NaN values while while the scipy function st.sem returns NaN if there is an NaN in the group. I need to calculate the standard error while ignoring NaNs as the mean function does.

I've tried going about calculating the 95% confidence interval like this:

#Calculate 95% confidence interval for each group
ci = grouped.aggregate(lambda x: np.std(x) / ??? * 1.96)

std in numpy will give me the standard deviation ignoring NaN values but I need to divide this by the square root of the group size ignoring NaNs in order to get the standard error.

What is the easiest way to calculate the standard error while ignoring NaNs?

HYRY · Accepted Answer

count() method of Series object will return no NaN value count:

import pandas as pd
s = pd.Series([1,2,np.nan, 3])
print s.count()

output:

So, try:

ci = grouped.aggregate(lambda x: np.std(x) / x.count() * 1.96)

Standard error ignoring NaN in pandas groupby groups

Tags:

python

pandas

nan

numpy

scipy

pbreach

1 Answers

HYRY

Recent Activity

Donate For Us

Standard error ignoring NaN in pandas groupby groups

Tags:

python

pandas

nan

numpy

scipy

pbreach

1 Answers

HYRY

Related questions

Recent Activity

Donate For Us