Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Standard error ignoring NaN in pandas groupby groups

I have data loaded into a dataframe with that has a multi index for the columns headers. Currently I've been grouping the data by the columns indices to take the mean of the groups and calculate the 95% confidence intervals like this:

from pandas import *
import pandas as pd
from scipy import stats as st

#Normalize to starting point then convert
normalized = (data - data.ix[0]) * 11.11111
#Group normalized data based on slope and orientation
grouped = normalized.groupby(level=['SLOPE','DEPTH'], axis=1)
#Obtain mean of each group
means = grouped.mean()
#Calculate 95% confidence interval for each group
ci = grouped.aggregate(lambda x: st.sem(x) * 1.96)

but the problem with this is that the mean function that is used on the groups ignores NaN values while while the scipy function st.sem returns NaN if there is an NaN in the group. I need to calculate the standard error while ignoring NaNs as the mean function does.

I've tried going about calculating the 95% confidence interval like this:

#Calculate 95% confidence interval for each group
ci = grouped.aggregate(lambda x: np.std(x) / ??? * 1.96)

std in numpy will give me the standard deviation ignoring NaN values but I need to divide this by the square root of the group size ignoring NaNs in order to get the standard error.

What is the easiest way to calculate the standard error while ignoring NaNs?

like image 680
pbreach Avatar asked Aug 04 '13 05:08

pbreach


1 Answers

count() method of Series object will return no NaN value count:

import pandas as pd
s = pd.Series([1,2,np.nan, 3])
print s.count()

output:

3

So, try:

ci = grouped.aggregate(lambda x: np.std(x) / x.count() * 1.96)
like image 122
HYRY Avatar answered Sep 26 '22 00:09

HYRY