Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Two-sample Kolmogorov-Smirnov Test in Python Scipy

I can't figure out how to do a Two-sample KS test in Scipy.

After reading the documentation scipy kstest

I can see how to test where a distribution is identical to standard normal distribution

from scipy.stats import kstest import numpy as np  x = np.random.normal(0,1,1000) test_stat = kstest(x, 'norm') #>>> test_stat #(0.021080234718821145, 0.76584491300591395) 

Which means that at p-value of 0.76 we can not reject the null hypothesis that the two distributions are identical.

However, I want to compare two distributions and see if I can reject the null hypothesis that they are identical, something like:

from scipy.stats import kstest import numpy as np  x = np.random.normal(0,1,1000) z = np.random.normal(1.1,0.9, 1000) 

and test whether x and z are identical

I tried the naive:

test_stat = kstest(x, z) 

and got the following error:

TypeError: 'numpy.ndarray' object is not callable 

Is there a way to do a two-sample KS test in Python? If so, how should I do it?

Thank You in Advance

like image 728
Akavall Avatar asked Jun 04 '12 16:06

Akavall


People also ask

What is two sample Kolmogorov-Smirnov test?

The two sample Kolmogorov-Smirnov test is a nonparametric test that compares the cumulative distributions of two data sets(1,2). The test is nonparametric. It does not assume that data are sampled from Gaussian distributions (or any other defined distributions).

How do I run a Kolmogorov-Smirnov test in Python?

The Kolmogorov-Smirnov test is used to test whether or not or not a sample comes from a certain distribution. To perform a Kolmogorov-Smirnov test in Python we can use the scipy. stats. kstest() for a one-sample test or scipy.

What is the formula of Kolmogorov-Smirnov test?

Fo(X) = Observed cumulative frequency distribution of a random sample of n observations. and Fo(X)=kn = (No. of observations ≤ X)/(Total no. of observations).


2 Answers

You are using the one-sample KS test. You probably want the two-sample test ks_2samp:

>>> from scipy.stats import ks_2samp >>> import numpy as np >>>  >>> np.random.seed(12345678) >>> x = np.random.normal(0, 1, 1000) >>> y = np.random.normal(0, 1, 1000) >>> z = np.random.normal(1.1, 0.9, 1000) >>>  >>> ks_2samp(x, y) Ks_2sampResult(statistic=0.022999999999999909, pvalue=0.95189016804849647) >>> ks_2samp(x, z) Ks_2sampResult(statistic=0.41800000000000004, pvalue=3.7081494119242173e-77) 

Results can be interpreted as following:

  1. You can either compare the statistic value given by python to the KS-test critical value table according to your sample size. When statistic value is higher than the critical value, the two distributions are different.

  2. Or you can compare the p-value to a level of significance a, usually a=0.05 or 0.01 (you decide, the lower a is, the more significant). If p-value is lower than a, then it is very probable that the two distributions are different.

like image 77
DSM Avatar answered Oct 05 '22 09:10

DSM


This is what the scipy docs say:

If the K-S statistic is small or the p-value is high, then we cannot reject the hypothesis that the distributions of the two samples are the same.

Cannot reject doesn't mean we confirm.

like image 34
jun 小嘴兔 Avatar answered Oct 05 '22 10:10

jun 小嘴兔