Kruskal-Wallis test with details on pairwise comparisons

Tags:

r

The standard stats::kruskal.test module allows to calculate the kruskal-wallis test on a dataset:

>>> data(diamonds)
>>> kruskal.test(price~carat, data=diamonds)

Kruskal-Wallis rank sum test

data:  price by carat by color 
Kruskal-Wallis chi-squared = 50570.15, df = 272, p-value < 2.2e-16

This is correct, it is giving me a probability that all the groups in the data have the same mean.

However, I would like to have the details for each pair comparison, like if diamonds of colors D and E have the same mean price, as some other softwares do (SPSS) when you ask for a Kruskal test.

I have found kruskalmc from the package pgirmess which allows me to do what I want to do:

> kruskalmc(diamonds$price, diamonds$color)
Multiple comparison test after Kruskal-Wallis 
p.value: 0.05 
Comparisons
      obs.dif critical.dif difference
D-E  571.7459     747.4962      FALSE
D-F 2237.4309     751.5684       TRUE
D-G 2643.1778     726.9854       TRUE
D-H 4539.4392     774.4809       TRUE
D-I 6002.6286     862.0150       TRUE
D-J 8077.2871    1061.7451       TRUE
E-F 2809.1767     680.4144       TRUE
E-G 3214.9237     653.1587       TRUE
E-H 5111.1851     705.6410       TRUE
E-I 6574.3744     800.7362       TRUE
E-J 8649.0330    1012.6260       TRUE
F-G  405.7470     657.8152      FALSE
F-H 2302.0083     709.9533       TRUE
F-I 3765.1977     804.5390       TRUE
F-J 5839.8562    1015.6357       TRUE
G-H 1896.2614     683.8760       TRUE
G-I 3359.4507     781.6237       TRUE
G-J 5434.1093     997.5813       TRUE
H-I 1463.1894     825.9834       TRUE
H-J 3537.8479    1032.7058       TRUE
I-J 2074.6585    1099.8776       TRUE

However, this package only allows for one categoric variable (e.g. I can't study the prices clustered by color and by carat, as I can do with kruskal.test), and I don't know anything about the pgirmess package, whether it is maintained or not, or if it is tested.

Can you recommend me a package to execute the Kruskal-Wallis test which returns details for every comparison? How would you handle the problem?

513

asked Mar 19 '10 15:03

dalloliogm

1 Answers

One other approach besides kruskal::agricolae mentioned by Marek, is the Nemenyi-Damico-Wolfe-Dunn test implemented in the help page for oneway_test in the coin package that uses multcomp. Using hadley's setup and reducing the B= value for the approximate() function so it finishes in finite time:

#updated translation of help page implementation of NDWD
NDWD <- 
    independence_test(dv ~ iv, data = sum_codings1, distribution = approximate(B = 10000), 
                          ytrafo = function(data) trafo(data, numeric_trafo = rank_trafo), 
                          xtrafo = mcp_trafo(iv = "Tukey"))


    ### global p-value
    print(pvalue(NDWD))

    ### sites (I = II) != (III = IV) at alpha = 0.01 (page 244)
    print(pvalue(NDWD, method = "single-step"))

More stable results on that larger dataset may require increasing the B value and increasing the user's patience.

Jan: 2012: There was recently a posting on R-help claiming unexpected results from this method so I forwarded that email to the maintainer. Mark Difford said he had confirmed the problems and offered an alternate tests with the nparcomp package: https://stat.ethz.ch/pipermail/r-help/2012-January/300100.html

There were also in the same week a couple of other suggestions on rhelp for post-hoc contrasts to KW tests: kruskalmc suggested by Mario Garrido Escudero and rms::polr followed by rms::contrasts suggested by Frank Harrell https://stat.ethz.ch/pipermail/r-help/2012-January/300329.html

Nov 2015: Agree with toto_tico that help page code of coin package has been changed in the intervening years. The ?independence_test help page now offers a multivariate-KW test and the ?oneway_test help page has replace its earlier implementation with the code above usng the independence_test function.

answered Oct 07 '22 07:10

IRTFM

Related questions
                            
                                How to avoid: read.table truncates numeric values beginning with 0
                            
                                R: Count unique values by category
                            
                                Extending Suffixes in Merge to All Non-by Columns
                            
                                ggplot2: Adding secondary transformed x-axis on top of plot
                            
                                standard deviation on dataframe does not work
                            
                                Generating all permutations of N balls in M bins
                            
                                How to change fontface (bold/italics) for a cell in a kable table in rmarkdown?
                            
                                What does the t in tapply stand for?
                            
                                Encountering error when installing rpy2: Tried to guess R's HOME but no R command in the PATH
                            
                                Convert from n x m matrix to long matrix in R [duplicate]
                            
                                plot.roc for multiclass.roc in pROC package?
                            
                                R 3.3.0 installing a package on Windows: gcc not found error
                            
                                Leaflet Legend for Custom Markers in R
                            
                                How to set the R_HOME environment variable to the R home directory?
                            
                                R and Windows Authentication
                            
                                How do I access/print/track the current tab selection in a Shiny app?
                            
                                Legends for multiple fills in ggplot
                            
                                How to download a PDF file in a Shiny app
                            
                                Difference between Linear Regression Coefficients between Python and R
                            
                                How to calculate centroid of polygon using sf::st_centroid?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With