I'm talking about retrieving the values of this table media a python formula
https://www.soest.hawaii.edu/GG/FACULTY/ITO/GG413/K_S_Table_one_Sample.pdf
i've been looking for a while but scipy functions do not look for this value and tbh I'm getting pretty confused over here.
I've been looking inside scipy built in formulas, without success. For example, in the aforementioned table, D[0.1, 10] == 0.36866. Yet scipy.stats.kstest does NOT return this same value, no matter how much do I play with my data.
This can be done with scipy
, using the ksone
distribution and its ppf
(percent point function) method, rather than the kstest
:
from scipy.stats import ksone
def ks_critical_value(n_trials, alpha):
return ksone.ppf(1-alpha/2, n_trials)
Printing a table of critical values:
from __future__ import print_function # For Python 2
trials = range(1, 41)
alphas = [0.1, 0.05, 0.02, 0.01]
# Print table headers
print('{:<6}|{:<6} Level of significance, alpha'.format(' ', ' '))
print('{:<6}|{:>8} {:>8} {:>8} {:>8}'.format(*['Trials'] + alphas))
print('-' * 42)
# Print critical values for each n_trials x alpha combination
for t in trials:
print('{:6d}|{:>8.5f} {:>8.5f} {:>8.5f} {:>8.5f}'
.format(*[t] + [ks_critical_value(t, a) for a in alphas]))
if t % 10 == 0:
print()
Partial output:
| Level of significance, alpha
Trials| 0.1 0.05 0.02 0.01
------------------------------------------
1| nan nan nan nan
2| 0.77639 0.84189 nan nan
3| 0.63604 0.70760 0.78456 0.82900
4| 0.56522 0.62394 0.68887 0.73424
5| 0.50945 0.56328 0.62718 0.66853
6| 0.46799 0.51926 0.57741 0.61661
7| 0.43607 0.48342 0.53844 0.57581
8| 0.40962 0.45427 0.50654 0.54179
9| 0.38746 0.43001 0.47960 0.51332
10| 0.36866 0.40925 0.45662 0.48893
11| 0.35242 0.39122 0.43670 0.46770
12| 0.33815 0.37543 0.41918 0.44905
13| 0.32549 0.36143 0.40362 0.43247
14| 0.31417 0.34890 0.38970 0.41762
15| 0.30397 0.33760 0.37713 0.40420
16| 0.29472 0.32733 0.36571 0.39201
17| 0.28627 0.31796 0.35528 0.38086
18| 0.27851 0.30936 0.34569 0.37062
19| 0.27136 0.30143 0.33685 0.36117
20| 0.26473 0.29408 0.32866 0.35241
We need some additional feedback from a statistician on (a) why we get np.nan
values for the top two rows (I assume because the critical values for these combinations of n_trials
and alpha
are purely theoretical, and not achievable in practice), and (b) why the ksone.ppf
method needs alpha
to be divided by 2? I will edit this answer to include that information.
You can see though, that besides the initial missing values, this table generates identical results to the table in your question, and to the table on page 16 of this paper.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With