Kolmogorov-Smirnov test: exact p-values for a two-sample test applied to a discrete variable when ties exist

Tags:

r

I got two samples from different sites. The parameter I am interested in is discrete (frequencies). I did simulations for both sites, so I know the probabilities of a random distribution for each site. Because of my simulations I know that the deviation of my parameter from its mean is not normally distributed so I went for a parametric test. I checked with one-sample Kolmogorov-Smirnov if the samples might derive from these random distributions (example data, not real):

sample1 <- rep(1:5, c(25, 12, 12, 0, 1))
rand.prob1 <- c(.51, .28, .111, .08, 0.019)
StepProb1 <- stepfun(0:4, c(0, cumsum(rand.prob1)), right = T)
dgof::ks.test(sample1, StepProb1)

sample2 <- rep(1:5, c(19, 13, 10, 5, 3))
rand.prob2 <- c(.61, .18, .14, .05, 0.02)
StepProb2 <- stepfun(0:4, c(0, cumsum(rand.prob2)), right = T)
dgof::ks.test(sample2, StepProb2)

In a next step I want to check if the samples of both sites might derive from the same distribution. Both implemetations of the KS-test (packages stats and dgof) issue a warning because my samples have ties:

stats::ks.test(sample1, sample2)
dgof::ks.test(sample1, sample2)

If I understand Dufour and Farhat (2001) correctly, there is a way to calculate exact p-values through tie-breaking via Monte Carlo simulations. And if I understand the package description of the dgof package correctly, its implementation of Monte Carlo simulations only works for the one-sample test.

So my question: Does anybody know how to calculate exact p-values in R for a two-sample Kolmogorov-Smirnov test applied to a discrete variable when ties exist?

Or alternatively (though not specifically related to R): If nobody knows how to do this with a tolerable workload, I would go for the uncorrected p-values and as a consequence discuss results with care. But with p-values below 0.0001. I'm actually not overly concerned about it. But what do I know... Do you think this is right or am I making a grave mistake in this case?

Thanks in advance, I already appreciate that you read until here.

801

asked Feb 14 '14 08:02

Jonas

1 Answers

As mentioned in the comment, the function ks.boot of package Matching implements Bootstrap Kolmogorov-Smirnov, i.e., the Monte Carlo simulation for an arbitrary number of re-samplings with the nboots parameter. I think that will give you what you need.

answered Sep 20 '22 17:09

pbible

Related questions
                            
                                Using system with windows
                            
                                How to get a date from day of year
                            
                                Is there any difference between `geom_a(stat="b", ...)` and `stat_b(geom="a",...)`?
                            
                                Roll up a data.table
                            
                                control maximum number of iterations in lme4 1.0.*
                            
                                R - using glm inside a data.table
                            
                                Transparency in gtable Objects
                            
                                Python equivalent of R "split"-function
                            
                                R code doesn't save plot image [duplicate]
                            
                                apply() in R with user-defined function
                            
                                How to load packages in R
                            
                                Removing Specific factor level from factor variable
                            
                                shinyapps setAccountInfo error
                            
                                ggplot: Showing x-axis line for each facet plot
                            
                                Reshape a matrix to get a network
                            
                                Log Y-axis in Boxplot in R
                            
                                Error in untar( ) while using R
                            
                                Use wordlayout results for ggplot geom_text
                            
                                How to kill a doMC worker when it's done?
                            
                                displaying multiple inequality symbols using expression()

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With