I have a the mean, std dev and n of sample 1 and sample 2 - samples are taken from the sample population, but measured by different labs.
n is different for sample 1 and sample 2. I want to do a weighted (take n into account) two-tailed t-test.
I tried using the scipy.stat module by creating my numbers with np.random.normal
, since it only takes data and not stat values like mean and std dev (is there any way to use these values directly). But it didn't work since the data arrays has to be of equal size.
Any help on how to get the p-value would be highly appreciated.
The two-sample t-test (Snedecor and Cochran, 1989) is used to determine if two population means are equal. A common application is to test if a new process or treatment is superior to a current process or treatment. There are several variations on this test. The data may either be paired or not paired.
Independent two-sample t-test. Paired sample t-test.
A two-sample t-test is used when you want to compare two independent groups to see if their means are different.
If you have the original data as arrays a
and b
, you can use scipy.stats.ttest_ind
with the argument equal_var=False
:
t, p = ttest_ind(a, b, equal_var=False)
If you have only the summary statistics of the two data sets, you can calculate the t value using scipy.stats.ttest_ind_from_stats
(added to scipy in version 0.16) or from the formula (http://en.wikipedia.org/wiki/Welch%27s_t_test).
The following script shows the possibilities.
from __future__ import print_function import numpy as np from scipy.stats import ttest_ind, ttest_ind_from_stats from scipy.special import stdtr np.random.seed(1) # Create sample data. a = np.random.randn(40) b = 4*np.random.randn(50) # Use scipy.stats.ttest_ind. t, p = ttest_ind(a, b, equal_var=False) print("ttest_ind: t = %g p = %g" % (t, p)) # Compute the descriptive statistics of a and b. abar = a.mean() avar = a.var(ddof=1) na = a.size adof = na - 1 bbar = b.mean() bvar = b.var(ddof=1) nb = b.size bdof = nb - 1 # Use scipy.stats.ttest_ind_from_stats. t2, p2 = ttest_ind_from_stats(abar, np.sqrt(avar), na, bbar, np.sqrt(bvar), nb, equal_var=False) print("ttest_ind_from_stats: t = %g p = %g" % (t2, p2)) # Use the formulas directly. tf = (abar - bbar) / np.sqrt(avar/na + bvar/nb) dof = (avar/na + bvar/nb)**2 / (avar**2/(na**2*adof) + bvar**2/(nb**2*bdof)) pf = 2*stdtr(dof, -np.abs(tf)) print("formula: t = %g p = %g" % (tf, pf))
The output:
ttest_ind: t = -1.5827 p = 0.118873 ttest_ind_from_stats: t = -1.5827 p = 0.118873 formula: t = -1.5827 p = 0.118873
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With