I have two datasets each in two dimensions : (X1,Y1) and (X2,Y2). I want to be able to calculate a KS test statistic to determine whether values from these two datasets arise from the same or different distributions. I have used scipy.stats.ks_2samp before but that is to compare two datasets in one dimension or perhaps the probability distribution of the two samples. In this case however, I'm stuck with not probability distributions but discrete x and y values for both the samples. How can I go ahead and get the ks test value in python for this situation? I have separate numpy arrays for each of the parameters X1, Y1, X2 and Y2. Thank you!
This Notebook provide a Python implementation for 2D K-S test with 2 samples. The
(broken link).py
file can be downloaded here. The code seems to be a translation of C
code, the efficiency might be a problem if sample size is large.
The algorithm is first developed in two papers
A nice introduction and the C
implementation can be found in
Press, W.H. et al. 1992, Numerical Recipes in C, Section 14.7, p645.
You can find C++/Fortran
implementation in other versions of the book.
Here's a post titled Beware the Kolmogorov-Smirnov test is also related to the subject, you may want to have a look.
I have also written a python implementation using numpy, it should have a better performance than the quoted notebook. You can find the code here.
You'd better check the codes (no matter which one used) with the original papers/books before any application. The python implementations of 2d KS test are far less examined than other covential tests in numpy/scipy.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With