(problem resolved; x,y and s1,s2 were of different size) in R: <pre class="prettyprint"><code>x <- c(373,398,245,272,238,241,134,410,158,125,198,252,577,272,208,260) y <- c(411,471,320,364,311,390,163,424,228,144,246,371,680,384,279,303) t.test(x,y) t = -1.6229, df = 29.727, p-value = 0.1152 </code></pre> Same numbers are obtained in STATA and Excel <pre class="prettyprint"><code>t.test(x,y,alternative="less") t = -1.6229, df = 29.727, p-value = 0.05758 </code></pre> I cannot replicate the same result using either statsmodels.stats.weightstats.ttest_ind or scipy.stats.ttest_ind no matter which options I try. <pre class="prettyprint"><code>statsmodels.stats.weightstats.ttest_ind(s1,s2,alternative="two-sided",usevar="unequal") (-1.8912081781378358, 0.066740317997990656, 35.666557473974343) scipy.stats.ttest_ind(s1,s2,equal_var=False) (array(-1.8912081781378338), 0.066740317997990892) scipy.stats.ttest_ind(s1,s2,equal_var=True) (array(-1.8912081781378338), 0.066664507499812745) </code></pre> There must be thousands of people who use Python to calculate t-test. Are we all getting incorrect results? (I typically rely on Python but this time I checked my results with STATA).

That's the result that I get, with default equal var: <pre class="prettyprint"><code>>>> x_ = (373,398,245,272,238,241,134,410,158,125,198,252,577,272,208,260) >>> y_ = (411,471,320,364,311,390,163,424,228,144,246,371,680,384,279,303) >>> from scipy import stats >>> stats.ttest_ind(x_, y_) (array(-1.62292672368488), 0.11506840827144681) >>> import statsmodels.api as sm >>> sm.stats.ttest_ind(x_, y_) (-1.6229267236848799, 0.11506840827144681, 30.0) </code></pre> and with unequal var: <pre class="prettyprint"><code>>>> statsmodels.stats.weightstats.ttest_ind(x_, y_,alternative="two-sided",usevar="unequal") (-1.6229267236848799, 0.11516398707890187, 29.727196553288369) >>> stats.ttest_ind(x_, y_, equal_var=False) (array(-1.62292672368488), 0.11516398707890187) </code></pre>

The short answer is that the t-tests as provided in Python are the same results as one would get in R and Stata, you just had an additional element in your Python arrays. I wouldn't bank on Excel's robustness, however.

Why does t-test in Python (scipy, statsmodels) give results different from R, Stata, or Excel?

Tags:

python

scipy

statsmodels

(problem resolved; x,y and s1,s2 were of different size)

in R:

x <- c(373,398,245,272,238,241,134,410,158,125,198,252,577,272,208,260)
y <- c(411,471,320,364,311,390,163,424,228,144,246,371,680,384,279,303)
t.test(x,y)
t = -1.6229, df = 29.727, p-value = 0.1152

Same numbers are obtained in STATA and Excel

t.test(x,y,alternative="less")
t = -1.6229, df = 29.727, p-value = 0.05758

I cannot replicate the same result using either statsmodels.stats.weightstats.ttest_ind or scipy.stats.ttest_ind no matter which options I try.

statsmodels.stats.weightstats.ttest_ind(s1,s2,alternative="two-sided",usevar="unequal")
(-1.8912081781378358, 0.066740317997990656, 35.666557473974343)

scipy.stats.ttest_ind(s1,s2,equal_var=False)
(array(-1.8912081781378338), 0.066740317997990892)

scipy.stats.ttest_ind(s1,s2,equal_var=True)
(array(-1.8912081781378338), 0.066664507499812745)

There must be thousands of people who use Python to calculate t-test. Are we all getting incorrect results? (I typically rely on Python but this time I checked my results with STATA).

254

asked Dec 20 '13 18:12

Oleg

2 Answers

That's the result that I get, with default equal var:

>>> x_ = (373,398,245,272,238,241,134,410,158,125,198,252,577,272,208,260)
>>> y_ = (411,471,320,364,311,390,163,424,228,144,246,371,680,384,279,303)

>>> from scipy import stats
>>> stats.ttest_ind(x_, y_)
(array(-1.62292672368488), 0.11506840827144681)

>>> import statsmodels.api as sm
>>> sm.stats.ttest_ind(x_, y_)
(-1.6229267236848799, 0.11506840827144681, 30.0)

and with unequal var:

>>> statsmodels.stats.weightstats.ttest_ind(x_, y_,alternative="two-sided",usevar="unequal")
(-1.6229267236848799, 0.11516398707890187, 29.727196553288369)
>>> stats.ttest_ind(x_, y_, equal_var=False)
(array(-1.62292672368488), 0.11516398707890187)

185

answered Oct 21 '22 13:10

Josef

The short answer is that the t-tests as provided in Python are the same results as one would get in R and Stata, you just had an additional element in your Python arrays.

I wouldn't bank on Excel's robustness, however.

answered Oct 21 '22 14:10

Russia Must Remove Putin

Related questions
                            
                                How can I access a custom section in a Pyramid .ini file?
                            
                                Save raw data as tif
                            
                                SWIG python initialise a pointer to NULL
                            
                                Monitoring Java application with Python [closed]
                            
                                How to find a substring using partial matching [closed]
                            
                                Django: check if value in values_list with & without prefetch_related/select_related
                            
                                local histogram equalization
                            
                                Truncated multivariate normal in SciPy?
                            
                                TypeError on CORS for flask-restful
                            
                                How to use shared queues with python flask Restful web services
                            
                                loading cookies from selenium to mechanize with cookielib
                            
                                Ubuntu 12.04 LTS: Update python 2.7.3 to 2.7.6 without breaking dependencies [closed]
                            
                                NumPy Matrix Multiplication Efficiency for Matrix With Known Structure
                            
                                Python - Using socket.gethostbyname through proxy
                            
                                Efficient combinations of N colored elements with restriction in the number of colors
                            
                                Efficient way to drop a column from a Numpy array?
                            
                                Numpy loadtxt skip first column
                            
                                Output utf-8 characters in django as json
                            
                                t test on Pandas dataframes and make a new matrix of resulting p values
                            
                                How to process huge text files that contain EOF / Ctrl-Z characters using Python on Windows?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With