I'm computing Spearman's rho on small sets of paired rankings. Spearman is well known for not handling ties properly. For example, taking 2 sets of 8 rankings, even if 6 are ties in one of the two sets, the correlation is still very high:
> cor.test(c(1,2,3,4,5,6,7,8), c(0,0,0,0,0,0,7,8), method="spearman") Spearman's rank correlation rho S = 19.8439, p-value = 0.0274 sample estimates: rho 0.7637626 Warning message: Cannot compute exact p-values with ties
The p-value <.05 seems like a pretty high statistical significance for this data. Is there a ties-corrected version of Spearman in R? What is the best formula to date to compute it with a lot of ties?
Spearman's rank correlation measures the strength and direction of association between two ranked variables. It basically gives the measure of monotonicity of the relation between two variables i.e. how well the relationship between two variables could be represented using a monotonic function.
Spearman rank correlation: Spearman rank correlation is a non-parametric test that is used to measure the degree of association between two variables.
MEAN method: As discussed above, you can assign the rank of the tied values to be the mean position, which is (R + R+k-1)/2 = R + (k-1)/2. LOW or HIGH method: For the low method, the rank of the tied values is assigned to be R. For the high method, the rank of the tied values is assigned to be R+k-1.
Well, Kendall tau rank correlation is also a non-parametric test for statistical dependence between two ordinal (or rank-transformed) variables--like Spearman's, but unlike Spearman's, can handle ties.
More specifically, there are three Kendall tau statistics--tau-a, tau-b, and tau-c. tau-b is specifically adapted to handle ties.
The tau-b statistic handles ties (i.e., both members of the pair have the same ordinal value) by a divisor term, which represents the geometric mean between the number of pairs not tied on x and the number not tied on y.
Kendall's tau is not Spearman's--they are not the same, but they are also quite similar. You'll have to decide, based on context, whether the two are similar enough such one can be substituted for the other.
For instance, tau-b:
Kendall_tau_b = (P - Q) / ( (P + Q + Y0)*(P + Q + X0) )^0.5
P: number of concordant pairs ('concordant' means the ranks of each member of the pair of data points agree)
Q: number of discordant pairs
X0: number of pairs not tied on x
Y0: number of pairs not tied on y
There is in fact a variant of Spearman's rho that explicitly accounts for ties. In situations in which i needed a non-parametric rank correlation statistic, i have always chosen tau over rho. The reason is that rho sums the squared errors, whereas tau sums the absolute discrepancies. Given that both tau and rho are competent statistics and we are left to choose, a linear penalty on discrepancies (tau) has always seemed to me, a more natural way to express rank correlation. That's not a recommendation, your context might be quite different and dictate otherwise.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With