T-Test in Scipy with NaN values

Tags:

I have a problem with doing a t-test in scipy that's driving me slowly crazy. It should be simple to resolve, but nothing I do works and there's no solution I can find through extensive searching. I'm using Spyder on the latest distribution of Anaconda.

Specifically: I want to compare means between two columns––'Trait_A' and 'Trait_B'––in a pandas dataframe that I've imported from a csv file. Some of the values in one of the columns are 'Nan' ('Not a Number'). The default setting on the independent samples scipy t-test function doesn't accommodate 'NaN' values. However, setting the 'nan_policy' parameter to 'omit' should deal with this. Nevertheless, when I do, the test statistic and p value come back as 'NaN.' When I restrict the range of values covered to actual numbers, the test works fine. My data and code are below; can anyone suggest what I'm doing wrong? Thanks!

Data:

     Trait_A   Trait_B
0   1.714286  0.000000
1   4.275862  4.000000
2   0.500000  4.625000
3   1.000000  0.000000
4   1.000000  4.000000
5   1.142857  1.000000
6   2.000000  1.000000
7   9.416667  1.956522
8   2.052632  0.571429
9   2.100000  0.166667
10  0.666667  0.000000
11  2.333333  1.705882
12  2.768145       NaN
13  0.000000       NaN
14  6.333333       NaN
15  0.928571       NaN

My code:

import pandas as pd
import scipy.stats as sp
data= pd.read_csv("filepath/Data2.csv")
print (sp.stats.ttest_ind(data['Trait_A'], data['Trait_B'], nan_policy='omit'))

My result:

Ttest_indResult(statistic=nan, pvalue=nan)

784

asked May 04 '16 08:05

Lodore66

Video Answer

2 Answers

It seems like a bug. You can drop nans before passing them to the t-test:

sp.stats.ttest_ind(data.dropna()['Trait_A'], data.dropna()['Trait_B'])
Ttest_indResult(statistic=0.88752464718609214, pvalue=0.38439692093551037)

answered Sep 19 '22 08:09

ayhan

The bug is in line 3885, in file scipy/scipy/stats/stats.py :

# check both a and b
contains_nan, nan_policy = (_contains_nan(a, nan_policy) or
                            _contains_nan(b, nan_policy))

must be

contains_nan             = (_contains_nan(a, nan_policy)[0] or
                            _contains_nan(b, nan_policy)[0])

swapping 'Trait_A' and 'Trait_B' in your case solve your problem.

answered Sep 18 '22 08:09

B. M.

Related questions
                            
                                How do I align text output in python?
                            
                                Django : Can we use .exclude() on .get() in django querysets
                            
                                sqlalchemy.exc.CircularDependencyError: Circular dependency detected
                            
                                Python closure vs javascript closure
                            
                                Is wordnet path similarity commutative?
                            
                                pandas equivalent of Stata's encode
                            
                                How to access axis label object in matplotlib?
                            
                                Regex validation with WTForms and python
                            
                                What does a "Could not find .egg-info directory in install record" from pip mean?
                            
                                plotting multiple plots generated inside a for loop on the same axes python
                            
                                pytest -- how do I use global / session-wide fixtures?
                            
                                how to save an array as a grayscale image with matplotlib/numpy?
                            
                                Restrict static file access to logged in users
                            
                                Reindexing after pandas.drop_duplicates
                            
                                pyplot/matplotlib Bar chart with fill color depending on value
                            
                                Making multiple calls with asyncio and adding result to a dictionary
                            
                                What is sys.stdin.fileno() in python
                            
                                is it possible Apply PCA on any Text Classification?
                            
                                traitlets.traitlets.TraitError in Pycharm
                            
                                How do I get the django HttpRequest from a django rest framework Request?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

T-Test in Scipy with NaN values

Tags:

python

numpy

anaconda

scipy