Two-sample Kolmogorov-Smirnov Test in Python Scipy

Tags:

I can't figure out how to do a Two-sample KS test in Scipy.

After reading the documentation scipy kstest

I can see how to test where a distribution is identical to standard normal distribution

from scipy.stats import kstest import numpy as np  x = np.random.normal(0,1,1000) test_stat = kstest(x, 'norm') #>>> test_stat #(0.021080234718821145, 0.76584491300591395)

Which means that at p-value of 0.76 we can not reject the null hypothesis that the two distributions are identical.

However, I want to compare two distributions and see if I can reject the null hypothesis that they are identical, something like:

from scipy.stats import kstest import numpy as np  x = np.random.normal(0,1,1000) z = np.random.normal(1.1,0.9, 1000)

and test whether x and z are identical

I tried the naive:

test_stat = kstest(x, z)

and got the following error:

TypeError: 'numpy.ndarray' object is not callable

Is there a way to do a two-sample KS test in Python? If so, how should I do it?

Thank You in Advance

728

asked Jun 04 '12 16:06

Akavall

2 Answers

You are using the one-sample KS test. You probably want the two-sample test ks_2samp:

>>> from scipy.stats import ks_2samp >>> import numpy as np >>>  >>> np.random.seed(12345678) >>> x = np.random.normal(0, 1, 1000) >>> y = np.random.normal(0, 1, 1000) >>> z = np.random.normal(1.1, 0.9, 1000) >>>  >>> ks_2samp(x, y) Ks_2sampResult(statistic=0.022999999999999909, pvalue=0.95189016804849647) >>> ks_2samp(x, z) Ks_2sampResult(statistic=0.41800000000000004, pvalue=3.7081494119242173e-77)

Results can be interpreted as following:

You can either compare the statistic value given by python to the KS-test critical value table according to your sample size. When statistic value is higher than the critical value, the two distributions are different.
Or you can compare the p-value to a level of significance a, usually a=0.05 or 0.01 (you decide, the lower a is, the more significant). If p-value is lower than a, then it is very probable that the two distributions are different.

answered Oct 05 '22 09:10

DSM

This is what the scipy docs say:

If the K-S statistic is small or the p-value is high, then we cannot reject the hypothesis that the distributions of the two samples are the same.

Cannot reject doesn't mean we confirm.

answered Oct 05 '22 10:10

jun 小嘴兔

Related questions
                            
                                How to draw a rectangle around a region of interest in python
                            
                                Downloading and unzipping a .zip file without writing to disk
                            
                                One liner: creating a dictionary from list with indices as keys
                            
                                Joining multiple strings if they are not empty in Python
                            
                                How can I remove the ANSI escape sequences from a string in python
                            
                                Django CSRF Cookie Not Set
                            
                                "python" not recognized as a command
                            
                                Installing lxml module in python
                            
                                How to implement virtual methods in Python?
                            
                                Efficiently generate a 16-character, alphanumeric string
                            
                                Why is '+' not understood by Python sets?
                            
                                How to get the difference between two dictionaries in Python?
                            
                                Understanding min_df and max_df in scikit CountVectorizer
                            
                                Choosing the correct upper and lower HSV boundaries for color detection with`cv::inRange` (OpenCV)
                            
                                Public free web services for testing soap client [closed]
                            
                                Why are assertEquals() parameters in the order (expected, actual)?
                            
                                WhatsApp API (java/python) [closed]
                            
                                What is the role of TimeDistributed layer in Keras?
                            
                                Add numpy array as column to Pandas data frame
                            
                                Python regex - r prefix

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Two-sample Kolmogorov-Smirnov Test in Python Scipy

Tags:

python

numpy

statistics

scipy

distribution

Akavall

People also ask

2 Answers

DSM

jun 小嘴兔

Recent Activity

Donate For Us