Measures of association in R -- Kendall's tau-b and tau-c

Tags:

Are there any R packages for the calculation of Kendall's tau-b and tau-c, and their associated standard errors? My searches on Google and Rseek have turned up nothing, but surely someone has implemented these in R.

733

asked Apr 01 '10 02:04

brentonk

7 Answers

Quite a while, but the 3 functions are implemented in DescTools.

library(DescTools)
# example in: 
# http://support.sas.com/documentation/cdl/en/statugfreq/63124/PDF/default/statugfreq.pdf
# pp. S. 1821
tab <- as.table(rbind(c(26,26,23,18,9),c(6,7,9,14,23)))

# tau-a
KendallTauA(tab, conf.level=0.95)
tau_a    lwr.ci    ups.ci 
0.2068323 0.1771300 0.2365346 

# tau-b
KendallTauB(tab, conf.level=0.95)
    tau_b    lwr.ci    ups.ci 
0.3372567 0.2114009 0.4631126 

# tau-c
> StuartTauC(tab, conf.level=0.95)
     tauc    lwr.ci    ups.ci 
0.4110953 0.2546754 0.5675151 

# alternative for tau-b:
d.frm <- Untable(tab, dimnames = list(1:2, 1:5))
cor(as.numeric(d.frm$Var1), as.numeric(d.frm$Var2),method="kendall")
[1] 0.3372567

# but no confidence intervalls for tau-b! Check:
unclass(cor.test(as.numeric(d.frm$Var1), as.numeric(d.frm$Var2), method="kendall"))

answered Sep 28 '22 15:09

Andri Signorell

There are three Kendall tau statistics (tau-a, tau-b, and tau-c).

They are not interchangeable, and none of the answers posted so far deal with the last two, which is the subject of the OP's question.

I was unable to find functions to calculate tau-b or tau-c, either in the R Standard Library (stat et al.) or in any of the Packages available on CRAN or other repositories. I used the excellent R Package sos to search, so i believe results returned were reasonably thorough.

So that's the short answer to the OP's Question: no built-in or Package function for tau-b or tau-c.

But it's easy to roll your own.

Writing R functions for the Kendall statistics is just a matter of translating these equations into code:

Kendall_tau_a = (P - Q) / (n * (n - 1) / 2)

Kendall_tau_b = (P - Q) / ( (P + Q + Y0) * (P + Q + X0) ) ^ 0.5 

Kendall_tau_c = (P - Q) * ((2 * m) / n ^ 2 * (m - 1) )

tau-a: equal to concordant minus discordant pairs, divided by a factor to account for total number of pairs (sample size).

tau-b: explicit accounting for ties--i.e., both members of the data pair have the same value; this value is equal to concordant minus discordant pairs divided by a term representing the geometric mean between the number of pairs not tied on x (X0) and the number not tied on y (Y0).

tau-c: larger-table variant also optimized for non-square tables; equal to concordant minus discordant pairs multiplied by a factor that adjusts for table size).

# Number of concordant pairs.
P = function(t) {
  r_ndx = row(t)
  c_ndx = col(t)
  sum(t * mapply(function(r, c){sum(t[(r_ndx > r) & (c_ndx > c)])},
    r = r_ndx, c = c_ndx))
}

# Number of discordant pairs.
Q = function(t) {
  r_ndx = row(t)
  c_ndx = col(t)
  sum(t * mapply( function(r, c){
      sum(t[(r_ndx > r) & (c_ndx < c)])
  },
    r = r_ndx, c = c_ndx) )
}

# Sample size (total number of pairs).
n = n = sum(t)

# The lesser of number of rows or columns.
m = min(dim(t))

So these four parameters are all you need to calculate tau-a, tau-b, and tau-c:

(plus XO & Y0 for tau-b)

For instance, the code for tau-c is:

kendall_tau_c = function(t){
    t = as.matrix(t) 
    m = min(dim(t))
    n = sum(t)
    ks_tauc = (m * 2 * (P(t) - Q(t))) / ((n ^ 2) * (m - 1))
}

So how are Kendall's tau statistics related to the other statistical tests used in categorical data analysis?

All three Kendall tau statistics, along with Goodman's and Kruskal's gamma are for correlation of ordinal and binary data. (The Kendall tau statistics are more sophisticated alternatives to the gamma statistic (just P-Q).)

And so Kendalls's tau and the gamma are counterparts to the simple chi-square and Fisher's exact tests, both of which are (as far as I know) suitable only for nominal data.

example:

cpa_group = c(4, 2, 4, 3, 2, 2, 3, 2, 1, 5, 5, 1)
revenue_per_customer_group = c(3, 3, 1, 3, 4, 4, 4, 3, 5, 3, 2, 2)
weight = c(1, 3, 3, 2, 2, 4, 0, 4, 3, 0, 1, 1)

dfx = data.frame(CPA=cpa_group, LCV=revenue_per_customer_group, freq=weight)

# Reshape data frame so 1 row for each event 
# (predicate step to create contingency table).
dfx2 = data.frame(lapply(dfx, function(x) { rep(x, dfx$freq)}))

t = xtabs(~ revenue + cpa, dfx)

kc = kendall_tau_c(t)

# Returns -.35.

answered Sep 28 '22 15:09

doug

Just to expand of Stedy's answer... cor(x,y,method="kendall") will give you the correlation, cor.test(x,y,method="kendall") will give you a p-value and CI.

Also, take a look at the Kendall package, which provides a function which claims a better approximation.

> library(Kendall)
> Kendall(x,y)

There is also the cor.matrix function in the Deducer package for nice printing:

> library(Deducer)
> cor.matrix(variables=d(mpg,hp,wt),,
+ data=mtcars,
+ test=cor.test,
+ method='kendall',
+ alternative="two.sided",exact=F)

                          Kendall's rank correlation tau                          

           mpg     hp      wt     
mpg    cor 1       -0.7428 -0.7278
         N 32      32      32     
    stat**         -5.871  -5.798 
   p-value         0.0000  0.0000 
----------                        
 hp    cor -0.7428 1       0.6113 
         N 32      32      32     
    stat** -5.871          4.845  
   p-value 0.0000          0.0000 
----------                        
 wt    cor -0.7278 0.6113  1      
         N 32      32      32     
    stat** -5.798  4.845          
   p-value 0.0000  0.0000         
----------                        
    ** z
    HA: two.sided

answered Sep 28 '22 15:09

Ian Fellows

Stumbled across this page today, as I was looking for an implementation of kendall tau-b in R
For anyone else looking for the same thing:
tau-b is in fact part of the stats package.

See this link for more details: https://stat.ethz.ch/pipermail/r-help//2012-August/333656.html

I tried it and it works: library(stats)

x <- c(1,1,2)
y<-c(1,2,3)
cor.test(x, y, method = "kendall", alternative = "greater")

this is the output:

data:  x and y
z = 1.2247, p-value = 0.1103
alternative hypothesis: true tau is greater than 0
sample estimates:
      tau 
0.8164966 

Warning message:
In cor.test.default(x, y, method = "kendall", alternative = "greater") :
  Cannot compute exact p-value with ties

Just ignore the warning messege. The tau is in fact tau b !!!

answered Sep 28 '22 16:09

nafrtiti

Doug's answer here is incorrect. Package Kendall can be used to calculate Tau b.

The Kendall package function Kendall (and it would also seem cor(x,y,method="kendall")) calculate ties using the formula for Tau-b. However, for vectors with ties, the Kendall package has the more correct p-value. See page 4 of the documentation for Kendall, from https://cran.r-project.org/web/packages/Kendall/Kendall.pdf page 4, with D referencing the denominator of the Kendall calculation:

and D = n(n − 1)/2. S is called the score and D, the denominator, is the maximum possible value of S. When there are ties, the formula for D is more complicated (Kendall, 1974, Ch. 3) and this general forumla for ties in both reankings is implemented in our function.The p-value of tau under the null hypothesis of no association is computed by in the case of no ties using an exact algorithm given by Best and Gipps (1974). When ties are present, a normal approximation with continuity correction is used by taking S as normally distributed with mean zero and variance var(S), where var(S) is given byKendall (1976, eqn 4.4, p.55). Unless ties are very extensive and/or the data is very short, this approximation is adequate. If extensive ties are present then the bootstrap provides an expedient solution (Davis and Hinkley, 1997). Alternatively an exact method based on exhaustive enumeration is also available (Valz and Thompson, 1994) but this is not implemented in this package.

I originally made an edit to Doug's answer regarding this, but it was rejected for 'being directed at the author and more appropriate as an answer or a comment'. I would have left this as a comment on the answer, but my reputation is not yet high enough to comment.

answered Sep 28 '22 16:09

j.m.sappenfield

Have you tried the function cor? There is a method you can set to "kendall" (also options for "pearson" and"spearman" if needed), not sure if that covers all the standard errors you are looking for but it should get you started.

answered Sep 28 '22 15:09

Stedy

I have been doing a bit research on Kendall's tau. Directly using cor(x, y, method="kendall") will give you Kendall's tau-b, which is a little different from the original definition, i.e., Kendall's tau-a. Kendall's tau-b is more commonly used as it takes into account ties, hence, most available software packages (e.g. cor(), Kendall()) all calculate Kendall's tau-b.

The difference between Kendall's tau-a and tau-b is essentially the denominator. Specifically, for Kendall's tau-a, the denominator D=n*(n-1)/2, which is fixed, while for Kendall's tau-b, the denominator D=sqrt(No. pairs of Var1 excluding tied pairs)*sqrt(No. pairs of Var2 excluding tied pairs). The value of tua-b is usually larger than tau-a.

As a simple example, consider X=(1,2,3,4,4), Y=(2,3,4,4,4). Kendall's tau-b=0.88, while tau-a=0.7.

For Kendall's tau-c, I didn't see too much on it, so no comments.

answered Sep 28 '22 17:09

SLi

Related questions
                            
                                Using [R] maps package - colouring in specific nations on a world map
                            
                                Breaking from function not loop in R
                            
                                R: Generate coordinate data from user-drawn points?
                            
                                Draw vertical ending of error bar line in dotplot
                            
                                Social graph analysis. 60GB and 100 million nodes
                            
                                only one margin of facet_grid
                            
                                How to create a stratified sample by state in R
                            
                                Can rollapply return a list of matrices?
                            
                                How to control paste behaviour in data.frame for integer type columns?
                            
                                Modifying the names of factors in logistic regression
                            
                                Fixed color for specific value
                            
                                How to fill the plot area using geom_raster or geom_tile
                            
                                How to plot the value of abline in R?
                            
                                Find rows with a given difference between values in a column
                            
                                Calculate variogram of raster data with NAs in R
                            
                                Calculate function for all row combinations of two matrices in R
                            
                                draw line across in a ggplot2
                            
                                R - Need help speeding up a for loop
                            
                                Override y-scale and x-scale using xlim/ylim or xrange/yrange in quantmod::chart_Series() - impossible?
                            
                                How to change heatmap.2 color range in R?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Measures of association in R -- Kendall's tau-b and tau-c

Tags:

r

statistics

distribution

brentonk

People also ask

7 Answers

Andri Signorell

doug

Ian Fellows

nafrtiti

j.m.sappenfield

Stedy

SLi

Recent Activity

Donate For Us