Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to perform two-sample one-tailed t-test with numpy/scipy

In R, it is possible to perform two-sample one-tailed t-test simply by using

> A = c(0.19826790, 1.36836629, 1.37950911, 1.46951540, 1.48197798, 0.07532846)
> B = c(0.6383447, 0.5271385, 1.7721380, 1.7817880)
> t.test(A, B, alternative="greater")

    Welch Two Sample t-test

data:  A and B 
t = -0.4189, df = 6.409, p-value = 0.6555
alternative hypothesis: true difference in means is greater than 0 
95 percent confidence interval:
 -1.029916       Inf 
sample estimates:
mean of x mean of y 
0.9954942 1.1798523 

In Python world, scipy provides similar function ttest_ind, but which can only do two-tailed t-tests. Closest information on the topic I found is this link, but it seems to be rather a discussion of the policy of implementing one-tailed vs two-tailed in scipy.

Therefore, my question is that does anyone know any examples or instructions on how to perform one-tailed version of the test using numpy/scipy?

like image 634
Timo Avatar asked Apr 13 '13 04:04

Timo


2 Answers

From your mailing list link:

because the one-sided tests can be backed out from the two-sided tests. (With symmetric distributions one-sided p-value is just half of the two-sided pvalue)

It goes on to say that scipy always gives the test statistic as signed. This means that given p and t values from a two-tailed test, you would reject the null hypothesis of a greater-than test when p/2 < alpha and t > 0, and of a less-than test when p/2 < alpha and t < 0.

like image 88
lvc Avatar answered Sep 30 '22 03:09

lvc


After trying to add some insights as comments to the accepted answer but not being able to properly write them down due to general restrictions upon comments, I decided to put my two cents in as a full answer.

First let's formulate our investigative question properly. The data we are investigating is

A = np.array([0.19826790, 1.36836629, 1.37950911, 1.46951540, 1.48197798, 0.07532846])
B = np.array([0.6383447, 0.5271385, 1.7721380, 1.7817880])

with the sample means

A.mean() = 0.99549419
B.mean() = 1.1798523

I assume that since the mean of B is obviously greater than the mean of A, you would like to check if this result is statistically significant.

So we have the Null Hypothesis

H0: A >= B

that we would like to reject in favor of the Alternative Hypothesis

H1: B > A

Now when you call scipy.stats.ttest_ind(x, y), this makes a Hypothesis Test on the value of x.mean()-y.mean(), which means that in order to get positive values throughout the calculation (which simplifies all considerations) we have to call

stats.ttest_ind(B,A)

instead of stats.ttest_ind(B,A). We get as an answer

  • t-value = 0.42210654140239207
  • p-value = 0.68406235191764142

and since according to the documentation this is the output for a two-tailed t-test we must divide the p by 2 for our one-tailed test. So depending on the Significance Level alpha you have chosen you need

p/2 < alpha

in order to reject the Null Hypothesis H0. For alpha=0.05 this is clearly not the case so you cannot reject H0.

An alternative way to decide if you reject H0 without having to do any algebra on t or p is by looking at the t-value and comparing it with the critical t-value t_crit at the desired level of confidence (e.g. 95%) for the number of degrees of freedom df that applies to your problem. Since we have

df = sample_size_1 + sample_size_2 - 2 = 8

we get from a statistical table like this one that

t_crit(df=8, confidence_level=95%) = 1.860

We clearly have

t < t_crit

so we obtain again the same result, namely that we cannot reject H0.

like image 31
bpirvu Avatar answered Sep 30 '22 03:09

bpirvu