Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to use scipy.stats.kstest/basic questions about Kolmogorov–Smirnov test

The help link is http://docs.scipy.org/doc/scipy-0.7.x/reference/generated/scipy.stats.kstest.html I can compute the ks-test value now,but I do not understand it. The code is as below.

from scipy import stats
import numpy as np
sample =np.loadtxt('mydata',delimiter=",",usecols=(2,),unpack=True)
print stats.kstest(sample, 'poisson', args=(1,))

Q1
If the reference distribution is constant,what word can replace 'poisson' above?
Q2
what is the meaning of args=(1,)?
Q3
If anybody is interested in ks-test,here is the wiki link.
http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test
Can we write our own python code to practice? We can get max(D) easily,but how to get Pr(k<=x) in the link? What is the relation between max(D) and Pr(k<=x)?

like image 988
questionhang Avatar asked Sep 25 '13 15:09

questionhang


1 Answers

Q2: look at this, I have a array called x1

>>> stats.kstest(x1, 'norm')
(0.50018855199491585, 0.0)
>>> stats.kstest(x1, stats.norm.cdf)
(0.50018855199491585, 0.0)
>>> stats.kstest(x1, stats.norm.cdf, args=(0,))
(0.50018855199491585, 0.0)
>>> stats.kstest(x1, stats.norm.cdf, args=(2,))
(0.84134903906580316, 0.0)
>>> stats.kstest(x1, 'norm', args=(2,))
(0.84134903906580316, 0.0)

If you pass the name of distribution, i.e., 'norm', what actually get passed to kstest is the standard distribution cdf. By standard, it means for normal distribution having mean==0 and sigma=1. If you don't want the standard cdf, you can pass additional parameters to cdf using args=(). In this case I only passed the mean. That is, we testing the difference between x1 and a normal distribution with mean==2 and sigma=1.

Q3: The short answer is, yes. But, why reinventing the wheel? If you want to know how it is implemented, just check the source code. It is in your_package_folder\scipy\stats\stats.py, line 3292.

like image 108
CT Zhu Avatar answered Oct 15 '22 19:10

CT Zhu