Spearman correlation and ties

Q: What does Spearman rank correlation tell you?

Spearman's rank correlation measures the strength and direction of association between two ranked variables. It basically gives the measure of monotonicity of the relation between two variables i.e. how well the relationship between two variables could be represented using a monotonic function.

Q: What is Spearman's correlation used for?

Spearman rank correlation: Spearman rank correlation is a non-parametric test that is used to measure the degree of association between two variables.

Q: How do you rank up a tied score?

MEAN method: As discussed above, you can assign the rank of the tied values to be the mean position, which is (R + R+k-1)/2 = R + (k-1)/2. LOW or HIGH method: For the low method, the rank of the tied values is assigned to be R. For the high method, the rank of the tied values is assigned to be R+k-1.

Tags:

r

correlation

I'm computing Spearman's rho on small sets of paired rankings. Spearman is well known for not handling ties properly. For example, taking 2 sets of 8 rankings, even if 6 are ties in one of the two sets, the correlation is still very high:

> cor.test(c(1,2,3,4,5,6,7,8), c(0,0,0,0,0,0,7,8), method="spearman")      Spearman's rank correlation rho  S = 19.8439, p-value = 0.0274  sample estimates:       rho  0.7637626   Warning message:  Cannot compute exact p-values with ties

The p-value <.05 seems like a pretty high statistical significance for this data. Is there a ties-corrected version of Spearman in R? What is the best formula to date to compute it with a lot of ties?

510

asked May 22 '12 23:05

Mulone

1 Answers

Well, Kendall tau rank correlation is also a non-parametric test for statistical dependence between two ordinal (or rank-transformed) variables--like Spearman's, but unlike Spearman's, can handle ties.

More specifically, there are three Kendall tau statistics--tau-a, tau-b, and tau-c. tau-b is specifically adapted to handle ties.

The tau-b statistic handles ties (i.e., both members of the pair have the same ordinal value) by a divisor term, which represents the geometric mean between the number of pairs not tied on x and the number not tied on y.

Kendall's tau is not Spearman's--they are not the same, but they are also quite similar. You'll have to decide, based on context, whether the two are similar enough such one can be substituted for the other.

For instance, tau-b:

Kendall_tau_b = (P - Q) / ( (P + Q + Y0)*(P + Q + X0) )^0.5

P: number of concordant pairs ('concordant' means the ranks of each member of the pair of data points agree)

Q: number of discordant pairs

X0: number of pairs not tied on x

Y0: number of pairs not tied on y

There is in fact a variant of Spearman's rho that explicitly accounts for ties. In situations in which i needed a non-parametric rank correlation statistic, i have always chosen tau over rho. The reason is that rho sums the squared errors, whereas tau sums the absolute discrepancies. Given that both tau and rho are competent statistics and we are left to choose, a linear penalty on discrepancies (tau) has always seemed to me, a more natural way to express rank correlation. That's not a recommendation, your context might be quite different and dictate otherwise.

163

answered Sep 21 '22 11:09

doug

Related questions
                            
                                possible to run RShiny app without opening an R environment?
                            
                                Converting ts object to data.frame
                            
                                showing a status message in R
                            
                                How to extract elements from a list with mixed elements
                            
                                Convert character matrix into numeric matrix
                            
                                which(vector1 < vector2)
                            
                                Convert hex to decimal in R
                            
                                how to create md5 hash of a column in R?
                            
                                multiple graphs in one canvas using ggplot2
                            
                                unknown timezone name in R strptime/as.POSIXct
                            
                                NA values not being excluded in `cor`
                            
                                Animated sorted bar chart with bars overtaking each other
                            
                                Grid line consistent with ticks on axis
                            
                                ggplot2 heatmap with colors for ranged values
                            
                                Calculate percentage change in an R data frame
                            
                                Arrange a grouped_df by group variable not working
                            
                                Internal links in rmarkdown don't work
                            
                                Place a legend for each facet_wrap grid in ggplot2
                            
                                Batch convert columns to numeric type
                            
                                Sum of two Columns of Data Frame with NA Values

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With