(problem resolved; x,y and s1,s2 were of different size)
in R:
x <- c(373,398,245,272,238,241,134,410,158,125,198,252,577,272,208,260)
y <- c(411,471,320,364,311,390,163,424,228,144,246,371,680,384,279,303)
t.test(x,y)
t = -1.6229, df = 29.727, p-value = 0.1152
Same numbers are obtained in STATA and Excel
t.test(x,y,alternative="less")
t = -1.6229, df = 29.727, p-value = 0.05758
I cannot replicate the same result using either statsmodels.stats.weightstats.ttest_ind or scipy.stats.ttest_ind no matter which options I try.
statsmodels.stats.weightstats.ttest_ind(s1,s2,alternative="two-sided",usevar="unequal")
(-1.8912081781378358, 0.066740317997990656, 35.666557473974343)
scipy.stats.ttest_ind(s1,s2,equal_var=False)
(array(-1.8912081781378338), 0.066740317997990892)
scipy.stats.ttest_ind(s1,s2,equal_var=True)
(array(-1.8912081781378338), 0.066664507499812745)
There must be thousands of people who use Python to calculate t-test. Are we all getting incorrect results? (I typically rely on Python but this time I checked my results with STATA).
It complements SciPy's stats module. Statsmodels is part of the Python scientific stack that is oriented towards data analysis, data science and statistics. Statsmodels is built on top of the numerical libraries NumPy and SciPy, integrates with Pandas for data handling, and uses Patsy for an R-like formula interface.
To perform one-sample t-test we will use the scipy. stats. ttest_1samp() function to perform one- sample t-test. The T-test is calculated for the mean of one set of values.
That's the result that I get, with default equal var:
>>> x_ = (373,398,245,272,238,241,134,410,158,125,198,252,577,272,208,260)
>>> y_ = (411,471,320,364,311,390,163,424,228,144,246,371,680,384,279,303)
>>> from scipy import stats
>>> stats.ttest_ind(x_, y_)
(array(-1.62292672368488), 0.11506840827144681)
>>> import statsmodels.api as sm
>>> sm.stats.ttest_ind(x_, y_)
(-1.6229267236848799, 0.11506840827144681, 30.0)
and with unequal var:
>>> statsmodels.stats.weightstats.ttest_ind(x_, y_,alternative="two-sided",usevar="unequal")
(-1.6229267236848799, 0.11516398707890187, 29.727196553288369)
>>> stats.ttest_ind(x_, y_, equal_var=False)
(array(-1.62292672368488), 0.11516398707890187)
The short answer is that the t-tests as provided in Python are the same results as one would get in R and Stata, you just had an additional element in your Python arrays.
I wouldn't bank on Excel's robustness, however.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With