I used model fitting to fit the negative binomial distribution to my discrete data. As a final step it looks like I need to perform a Kolmogrov-Smirnov test to determine if the model fits the data well. All the references I could find talk about using the test for normally distributed continuous data. Can someone tell me if this can be done in R for data that is not normally distributed and discrete? (Even a chi-square test should do I'm guessing but please correct me if I'm wrong.)
UPDATE:
So I found that the vcd
package contains a function goodfit
that can be used for this purpose in the following way:
library(vcd)
# Define the data
data <- c(67, 81, 93, 65, 18, 44, 31, 103, 64, 19, 27, 57, 63, 25, 22, 150,
31, 58, 93, 6, 86, 43, 17, 9, 78, 23, 75, 28, 37, 23, 108, 14, 137,
69, 58, 81, 62, 25, 54, 57, 65, 72, 17, 22, 170, 95, 38, 33, 34, 68,
38, 117, 28, 17, 19, 25, 24, 15, 103, 31, 33, 77, 38, 8, 48, 32, 48,
26, 63, 16, 70, 87, 31, 36, 31, 38, 91, 117, 16, 40, 7, 26, 15, 89,
67, 7, 39, 33, 58)
gf <- goodfit(data, type = "nbinomial", method = "MinChisq")
plot(gf)
But after the gf <- ...
step, R complains saying:
Warning messages:
1: In pnbinom(q, size, prob, lower.tail, log.p) : NaNs produced
2: In pnbinom(q, size, prob, lower.tail, log.p) : NaNs produced
3: In pnbinom(q, size, prob, lower.tail, log.p) : NaNs produced
and when I say plot
it complains:
Error in xy.coords(x, y, xlabel, ylabel, log) :
'x' is a list, but does not have components 'x' and 'y'
I am not sure what is happening because if I set data
to be the following:
data <- <- rnbinom(200, size = 1.5, prob = 0.8)
everything works fine. Any suggestions?
A KS-Test is for continuous variables only, plus you have to fully specify the distribution you are testing against. If you still wanted to do it, it would look like this:
ks.test(data, pnbinom, size=100, prob=0.8)
It compares the empirical cumulative distribution function of data
against the specified one (whether that makes sense probably depends on your data). You would have to choose parameters for size
and prob
based on theoretical considerations, the test is not valid if you estimate those parameters based on the data.
Your problem with goodfit()
might have to do with your data, are you sure these are counts? barplot(table(data))
does not look like it's approximately following a negative binomial distribution, compare, e.g., with barplot(table(rnbinom(200, size = 1.5, prob = 0.8)))
Finally, I'm not sure if the approach of doing a null-hypothesis test after fitting is appropriate. You may want to look into quantitative fit measures beyond / based on $\chi^2$ of which there are many (RMSEA, SRMR, ...).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With