Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Interpretting Scipy function's meaning and usage t.interval()

Tags:

scipy

I need some help using the scipy.stats.t.interval() function

http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.t.html?highlight=stats.t#scipy.stats.t

I am looking at the documentation, and it doesn't make sense. What are loc and scale? I'm used to student T intervals requiring a mean, sd, df, and confidence interval.

If you know the answer and can help, please post. Also if you could tell me how you learned it, that would be great. I've been having no luck with this documentation.

like image 824
SwimBikeRun Avatar asked Jun 25 '13 06:06

SwimBikeRun


People also ask

What is T in scipy stats?

A Student's t continuous random variable. For the noncentral t distribution, see nct . As an instance of the rv_continuous class, t object inherits from it a collection of generic methods (see below for the full list), and completes them with details specific for this particular distribution. See also nct. Notes.

How do you use t distribution in Python?

How to Calculate P-Values Using t Distribution. We can use the t. cdf(x, df, loc=0, scale=1) function to find the p-value associated with some t test statistic. Suppose we perform a one-tailed hypothesis test and end up with a t test statistic of -1.5 and degrees of freedom = 10.

What is PPF in scipy stats?

ppf: percent point function (or inverse cumulative distribution function) ppf returns the value x of the variable that has a given cumulative distribution probability (cdf). Thus, given the cdf(x) of a x value, ppf returns the value x itself, therefore, operating as the inverse of cdf.


2 Answers

The docs page you linked has a link to the source code. Which even has a nicely formatted formula for the distribution in the comments (search for class t_gen).

loc and scale are a way all the continuous distributions in scipy.stats are parametrized: Basically, for a distribution f(x), specifying loc and scale means you get f(loc + x*scale) (line 1208 in the source linked above).

>>> import scipy.stats as stats
>>> stats.t.pdf(2, 2) 
0.06804138174397717
>>> stats.t.pdf(2, 2, loc=0, scale=1) 
0.06804138174397717
>>> stats.t.pdf(2+42, 2, loc=42, scale=1) 
0.06804138174397717

>>> stats.t.stats(9, moments='mvsk')
(array(0.0), array(1.2857142857142858), array(0.0), array(1.2))
>>> stats.t.stats(8, loc=1, moments='mvsk')
(array(1.0), array(1.3333333333333333), array(0.0), array(1.5))

>>> stats.t.interval(0.95, 4, loc=0)
(-2.7764451051977987, 2.7764451051977987)
>>> stats.t.interval(0.95, 4, loc=3)
(0.22355489480220125, 5.7764451051977987)

Yes, this is a little baffling at first sight :-).

like image 134
ev-br Avatar answered Nov 07 '22 11:11

ev-br


Since the previous answer is not explicit, I made some research and just verified that:

loc is the mean.

scale is the standard error of the mean.

Such that: μ = M ± t(sM)

where μ is the t-interval, M is the mean, t is the t statistic, and sM = √(std^2/n) is the standard error of the mean.

like image 35
rossberto Avatar answered Nov 07 '22 12:11

rossberto