Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Confidence Interval for t-test (difference between means) in Python

Tags:

I am looking for a quick way to get the t-test confidence interval in Python for the difference between means. Similar to this in R:

X1 <- rnorm(n = 10, mean = 50, sd = 10) X2 <- rnorm(n = 200, mean = 35, sd = 14) # the scenario is similar to my data  t_res <- t.test(X1, X2, alternative = 'two.sided', var.equal = FALSE)     t_res 

Out:

    Welch Two Sample t-test  data:  X1 and X2 t = 1.6585, df = 10.036, p-value = 0.1281 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval:  -2.539749 17.355816 sample estimates: mean of x mean of y   43.20514  35.79711  

Next:

>> print(c(t_res$conf.int[1], t_res$conf.int[2])) [1] -2.539749 17.355816 

I am not really finding anything similar in either statsmodels or scipy, which is strange, considering the importance of significance intervals in hypothesis testing (and how much criticism the practice of reporting only the p-values recently got).

like image 224
Anarcho-Chossid Avatar asked Aug 02 '15 04:08

Anarcho-Chossid


People also ask

How do you find the 95 confidence interval in Python?

Create a new sample based on our dataset, with replacement and with the same number of points. Calculate the mean value and store it in an array or list. Repeat the process many times (e.g. 1000) On the list of the mean values, calculate 2.5th percentile and 97.5th percentile (if you want a 95% confidence interval)

How do we use the confidence interval for difference in difference in treatment means?

The confidence interval for the difference in means provides an estimate of the absolute difference in means of the outcome variable of interest between the comparison groups. It is often of interest to make a judgment as to whether there is a statistically meaningful difference between comparison groups.


1 Answers

Here how to use StatsModels' CompareMeans to calculate the confidence interval for the difference between means:

import numpy as np, statsmodels.stats.api as sms  X1, X2 = np.arange(10,21), np.arange(20,26.5,.5)  cm = sms.CompareMeans(sms.DescrStatsW(X1), sms.DescrStatsW(X2)) print cm.tconfint_diff(usevar='unequal') 

Output is

(-10.414599391793885, -5.5854006082061138) 

and matches R:

> X1 <- seq(10,20) > X2 <- seq(20,26,.5) > t.test(X1, X2)      Welch Two Sample t-test  data:  X1 and X2 t = -7.0391, df = 15.58, p-value = 3.247e-06 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval:  -10.414599  -5.585401 sample estimates: mean of x mean of y         15        23  
like image 93
Ulrich Stern Avatar answered Mar 03 '23 00:03

Ulrich Stern