I can't figure out how to do a Two-sample KS test in Scipy.
After reading the documentation scipy kstest
I can see how to test where a distribution is identical to standard normal distribution
from scipy.stats import kstest import numpy as np x = np.random.normal(0,1,1000) test_stat = kstest(x, 'norm') #>>> test_stat #(0.021080234718821145, 0.76584491300591395)
Which means that at p-value of 0.76 we can not reject the null hypothesis that the two distributions are identical.
However, I want to compare two distributions and see if I can reject the null hypothesis that they are identical, something like:
from scipy.stats import kstest import numpy as np x = np.random.normal(0,1,1000) z = np.random.normal(1.1,0.9, 1000)
and test whether x and z are identical
I tried the naive:
test_stat = kstest(x, z)
and got the following error:
TypeError: 'numpy.ndarray' object is not callable
Is there a way to do a two-sample KS test in Python? If so, how should I do it?
Thank You in Advance
The two sample Kolmogorov-Smirnov test is a nonparametric test that compares the cumulative distributions of two data sets(1,2). The test is nonparametric. It does not assume that data are sampled from Gaussian distributions (or any other defined distributions).
The Kolmogorov-Smirnov test is used to test whether or not or not a sample comes from a certain distribution. To perform a Kolmogorov-Smirnov test in Python we can use the scipy. stats. kstest() for a one-sample test or scipy.
Fo(X) = Observed cumulative frequency distribution of a random sample of n observations. and Fo(X)=kn = (No. of observations ≤ X)/(Total no. of observations).
You are using the one-sample KS test. You probably want the two-sample test ks_2samp
:
>>> from scipy.stats import ks_2samp >>> import numpy as np >>> >>> np.random.seed(12345678) >>> x = np.random.normal(0, 1, 1000) >>> y = np.random.normal(0, 1, 1000) >>> z = np.random.normal(1.1, 0.9, 1000) >>> >>> ks_2samp(x, y) Ks_2sampResult(statistic=0.022999999999999909, pvalue=0.95189016804849647) >>> ks_2samp(x, z) Ks_2sampResult(statistic=0.41800000000000004, pvalue=3.7081494119242173e-77)
Results can be interpreted as following:
You can either compare the statistic
value given by python to the KS-test critical value table according to your sample size. When statistic
value is higher than the critical value, the two distributions are different.
Or you can compare the p-value
to a level of significance a, usually a=0.05 or 0.01 (you decide, the lower a is, the more significant). If p-value is lower than a, then it is very probable that the two distributions are different.
This is what the scipy docs say:
If the K-S statistic is small or the p-value is high, then we cannot reject the hypothesis that the distributions of the two samples are the same.
Cannot reject doesn't mean we confirm.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With