Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perform 2 sample t-test

I have a the mean, std dev and n of sample 1 and sample 2 - samples are taken from the sample population, but measured by different labs.

n is different for sample 1 and sample 2. I want to do a weighted (take n into account) two-tailed t-test.

I tried using the scipy.stat module by creating my numbers with np.random.normal, since it only takes data and not stat values like mean and std dev (is there any way to use these values directly). But it didn't work since the data arrays has to be of equal size.

Any help on how to get the p-value would be highly appreciated.

like image 330
Norfeldt Avatar asked Mar 24 '14 13:03

Norfeldt


People also ask

What does a 2 sample t-test measure?

The two-sample t-test (Snedecor and Cochran, 1989) is used to determine if two population means are equal. A common application is to test if a new process or treatment is superior to a current process or treatment. There are several variations on this test. The data may either be paired or not paired.

What are the 2 types of two sample t tests?

Independent two-sample t-test. Paired sample t-test.

WHEN CAN 2 sample t methods be used?

A two-sample t-test is used when you want to compare two independent groups to see if their means are different.


1 Answers

If you have the original data as arrays a and b, you can use scipy.stats.ttest_ind with the argument equal_var=False:

t, p = ttest_ind(a, b, equal_var=False) 

If you have only the summary statistics of the two data sets, you can calculate the t value using scipy.stats.ttest_ind_from_stats (added to scipy in version 0.16) or from the formula (http://en.wikipedia.org/wiki/Welch%27s_t_test).

The following script shows the possibilities.

from __future__ import print_function  import numpy as np from scipy.stats import ttest_ind, ttest_ind_from_stats from scipy.special import stdtr  np.random.seed(1)  # Create sample data. a = np.random.randn(40) b = 4*np.random.randn(50)  # Use scipy.stats.ttest_ind. t, p = ttest_ind(a, b, equal_var=False) print("ttest_ind:            t = %g  p = %g" % (t, p))  # Compute the descriptive statistics of a and b. abar = a.mean() avar = a.var(ddof=1) na = a.size adof = na - 1  bbar = b.mean() bvar = b.var(ddof=1) nb = b.size bdof = nb - 1  # Use scipy.stats.ttest_ind_from_stats. t2, p2 = ttest_ind_from_stats(abar, np.sqrt(avar), na,                               bbar, np.sqrt(bvar), nb,                               equal_var=False) print("ttest_ind_from_stats: t = %g  p = %g" % (t2, p2))  # Use the formulas directly. tf = (abar - bbar) / np.sqrt(avar/na + bvar/nb) dof = (avar/na + bvar/nb)**2 / (avar**2/(na**2*adof) + bvar**2/(nb**2*bdof)) pf = 2*stdtr(dof, -np.abs(tf))  print("formula:              t = %g  p = %g" % (tf, pf)) 

The output:

ttest_ind:            t = -1.5827  p = 0.118873 ttest_ind_from_stats: t = -1.5827  p = 0.118873 formula:              t = -1.5827  p = 0.118873 
like image 130
Warren Weckesser Avatar answered Sep 21 '22 22:09

Warren Weckesser