Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spearman correlation and ties

Tags:

r

correlation

I'm computing Spearman's rho on small sets of paired rankings. Spearman is well known for not handling ties properly. For example, taking 2 sets of 8 rankings, even if 6 are ties in one of the two sets, the correlation is still very high:

> cor.test(c(1,2,3,4,5,6,7,8), c(0,0,0,0,0,0,7,8), method="spearman")      Spearman's rank correlation rho  S = 19.8439, p-value = 0.0274  sample estimates:       rho  0.7637626   Warning message:  Cannot compute exact p-values with ties 

The p-value <.05 seems like a pretty high statistical significance for this data. Is there a ties-corrected version of Spearman in R? What is the best formula to date to compute it with a lot of ties?

like image 510
Mulone Avatar asked May 22 '12 23:05

Mulone


People also ask

What does Spearman rank correlation tell you?

Spearman's rank correlation measures the strength and direction of association between two ranked variables. It basically gives the measure of monotonicity of the relation between two variables i.e. how well the relationship between two variables could be represented using a monotonic function.

What is Spearman's correlation used for?

Spearman rank correlation: Spearman rank correlation is a non-parametric test that is used to measure the degree of association between two variables.

How do you rank up a tied score?

MEAN method: As discussed above, you can assign the rank of the tied values to be the mean position, which is (R + R+k-1)/2 = R + (k-1)/2. LOW or HIGH method: For the low method, the rank of the tied values is assigned to be R. For the high method, the rank of the tied values is assigned to be R+k-1.


1 Answers

Well, Kendall tau rank correlation is also a non-parametric test for statistical dependence between two ordinal (or rank-transformed) variables--like Spearman's, but unlike Spearman's, can handle ties.

More specifically, there are three Kendall tau statistics--tau-a, tau-b, and tau-c. tau-b is specifically adapted to handle ties.

The tau-b statistic handles ties (i.e., both members of the pair have the same ordinal value) by a divisor term, which represents the geometric mean between the number of pairs not tied on x and the number not tied on y.

Kendall's tau is not Spearman's--they are not the same, but they are also quite similar. You'll have to decide, based on context, whether the two are similar enough such one can be substituted for the other.

For instance, tau-b:

Kendall_tau_b = (P - Q) / ( (P + Q + Y0)*(P + Q + X0) )^0.5 

P: number of concordant pairs ('concordant' means the ranks of each member of the pair of data points agree)

Q: number of discordant pairs

X0: number of pairs not tied on x

Y0: number of pairs not tied on y

There is in fact a variant of Spearman's rho that explicitly accounts for ties. In situations in which i needed a non-parametric rank correlation statistic, i have always chosen tau over rho. The reason is that rho sums the squared errors, whereas tau sums the absolute discrepancies. Given that both tau and rho are competent statistics and we are left to choose, a linear penalty on discrepancies (tau) has always seemed to me, a more natural way to express rank correlation. That's not a recommendation, your context might be quite different and dictate otherwise.

like image 163
doug Avatar answered Sep 21 '22 11:09

doug