How do you calculate Spearman correlation by group in R. I found the following link talking about Pearson correlation by group. But when I tried to replace the type with spearman, it does not work. https://stats.stackexchange.com/questions/4040/r-compute-correlation-by-group

How about this for a base R solution: <pre class="prettyprint"><code>df <- data.frame(group = rep(c("G1", "G2"), each = 10), var1 = rnorm(20), var2 = rnorm(20)) r <- by(df, df$group, FUN = function(X) cor(X$var1, X$var2, method = "spearman")) # df$group: G1 # [1] 0.4060606 # ------------------------------------------------------------ # df$group: G2 # [1] 0.1272727 </code></pre> And then, if you want the results in the form of a data.frame: <pre class="prettyprint"><code>data.frame(group = dimnames(r)[[1]], corr = as.vector(r)) # group corr # 1 G1 0.4060606 # 2 G2 0.1272727 </code></pre> EDIT: If you prefer a <code>plyr</code>-based solution, here is one: <pre class="prettyprint"><code>library(plyr) ddply(df, .(group), summarise, "corr" = cor(var1, var2, method = "spearman")) </code></pre>

very old question, but this <code>tidy</code> & <code>broom</code> solution is extremely straightforward. Thus I have to share the approach: <pre class="prettyprint"><code>set.seed(123) df <- data.frame(group = rep(c("G1", "G2"), each = 10), var1 = rnorm(20), var2 = rnorm(20)) library(tidyverse) library(broom) df %>% group_by(group) %>% summarize(correlation = cor(var1, var2,, method = "sp")) # A tibble: 2 x 2 group correlation <fct> <dbl> 1 G1 -0.200 2 G2 0.0545 # with pvalues and further stats df %>% nest(-group) %>% mutate(cor=map(data,~cor.test(.x$var1, .x$var2, method = "sp"))) %>% mutate(tidied = map(cor, tidy)) %>% unnest(tidied, .drop = T) # A tibble: 2 x 6 group estimate statistic p.value method alternative <fct> <dbl> <dbl> <dbl> <chr> <chr> 1 G1 -0.200 198 0.584 Spearman's rank correlation rho two.sided 2 G2 0.0545 156 0.892 Spearman's rank correlation rho two.sided </code></pre> Since some time/<code>dplyr</code> version, you need to write this to get results like above and no errors: <pre class="prettyprint"><code>df %>% nest(data = -group) %>% mutate(cor=map(data,~cor.test(.x$var1, .x$var2, method = "sp"))) %>% mutate(tidied = map(cor, tidy)) %>% unnest(tidied) %>% select(-data, -cor) </code></pre>

If you want an efficient solution for large numbers of groups then <code>data.table</code> is the way to go. <pre class="prettyprint"><code>library(data.table) DT <- as.data.table(df) setkey(DT, group) DT[,list(corr = cor(var1,var2,method = 'spearman')), by = group] </code></pre>

spearman correlation by group in R

4 Answers

How about this for a base R solution:

df <- data.frame(group = rep(c("G1", "G2"), each = 10),
                 var1 = rnorm(20),
                 var2 = rnorm(20))

r <- by(df, df$group, FUN = function(X) cor(X$var1, X$var2, method = "spearman"))
# df$group: G1
# [1] 0.4060606
# ------------------------------------------------------------ 
# df$group: G2
# [1] 0.1272727

And then, if you want the results in the form of a data.frame:

data.frame(group = dimnames(r)[[1]], corr = as.vector(r))
#   group      corr
# 1    G1 0.4060606
# 2    G2 0.1272727

EDIT: If you prefer a plyr-based solution, here is one:

library(plyr)
ddply(df, .(group), summarise, "corr" = cor(var1, var2, method = "spearman"))

124

answered Oct 02 '22 22:10

Josh O'Brien

very old question, but this tidy & broom solution is extremely straightforward. Thus I have to share the approach:

set.seed(123)
df <- data.frame(group = rep(c("G1", "G2"), each = 10),
                 var1 = rnorm(20),
                 var2 = rnorm(20))

library(tidyverse)
library(broom)

df  %>% 
  group_by(group) %>%
  summarize(correlation = cor(var1, var2,, method = "sp"))
# A tibble: 2 x 2
  group correlation
  <fct>       <dbl>
1 G1        -0.200 
2 G2         0.0545

# with pvalues and further stats
df %>% 
  nest(-group) %>% 
  mutate(cor=map(data,~cor.test(.x$var1, .x$var2, method = "sp"))) %>%
  mutate(tidied = map(cor, tidy)) %>% 
  unnest(tidied, .drop = T)
# A tibble: 2 x 6
  group estimate statistic p.value method                          alternative
  <fct>    <dbl>     <dbl>   <dbl> <chr>                           <chr>      
1 G1     -0.200        198   0.584 Spearman's rank correlation rho two.sided  
2 G2      0.0545       156   0.892 Spearman's rank correlation rho two.sided

Since some time/dplyr version, you need to write this to get results like above and no errors:

df %>% 
  nest(data = -group) %>%
  mutate(cor=map(data,~cor.test(.x$var1, .x$var2, method = "sp"))) %>%
  mutate(tidied = map(cor, tidy)) %>% 
  unnest(tidied) %>% 
  select(-data, -cor)

answered Oct 02 '22 23:10

Roman

Here's another way to do it:

# split the data by group then apply spearman correlation
# to each element of that list
j <- lapply(split(df, df$group), function(x){cor(x[,2], x[,3], method = "spearman")})

# Bring it together
data.frame(group = names(j), corr = unlist(j), row.names = NULL)

Comparing my method, Josh's method, and the plyr solution using rbenchmark:

Dason <- function(){
    # split the data by group then apply spearman correlation
    # to each element of that list
    j <- lapply(split(df, df$group), function(x){cor(x[,2], x[,3], method = "spearman")})

    # Bring it together
    data.frame(group = names(j), corr = unlist(j), row.names = NULL)
}

Josh <- function(){
    r <- by(df, df$group, FUN = function(X) cor(X$var1, X$var2, method = "spearman"))
    data.frame(group = attributes(r)$dimnames[[1]], corr = as.vector(r))
}

plyr <- function(){
    ddply(df, .(group), summarise, "corr" = cor(var1, var2, method = "spearman"))
}


library(rbenchmark)
benchmark(Dason(), Josh(), plyr())

Which gives the output

> benchmark(Dason(), Josh(), plyr())
     test replications elapsed relative user.self sys.self user.child sys.child
1 Dason()          100    0.19 1.000000      0.19        0         NA        NA
2  Josh()          100    0.24 1.263158      0.22        0         NA        NA
3  plyr()          100    0.51 2.684211      0.52        0         NA        NA

So it appears my method is slightly faster but not by much. I think Josh's method is a little more intuitive. The plyr solution is the easiest to code up but it's not the fastest (but it sure is a lot more convenient)!

answered Oct 02 '22 21:10

Dason

If you want an efficient solution for large numbers of groups then data.table is the way to go.

library(data.table)
DT <- as.data.table(df)
setkey(DT, group)
DT[,list(corr = cor(var1,var2,method = 'spearman')), by = group]

answered Oct 02 '22 22:10

mnel

Related questions
                            
                                How do I flip rows and columns in R
                            
                                Label lines in a plot
                            
                                UTF-8 file output in R
                            
                                Running R scripts from VBA
                            
                                Collapse rows in a data frame using R
                            
                                Does R have any package for parsing out the parts of a URL?
                            
                                Label individual panels in a multi-panel ggplot2
                            
                                Convert a vector of string to a vector of integer
                            
                                executing cv.glmnet in parallel in R
                            
                                Fastest way to extract hour from time (HH:MM)
                            
                                How do I remove verbs, prepositions, conjunctions etc from my text? [closed]
                            
                                Text labels with background colour in R
                            
                                Explain ungroup() in dplyr
                            
                                Deciding between NumericVector and arma::vec in Rcpp
                            
                                Function that converts a vector of numbers to a vector of standard units
                            
                                Combine data.frames summing up values of identical columns in R
                            
                                Dealing with wrong spelling when matching text strings in R
                            
                                Extracting a random sample of rows in a data.frame with a nested conditional
                            
                                Finding the index of first changes in the elements of a vector
                            
                                Assigning Dates to Fiscal Year

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

spearman correlation by group in R

Tags:

r

user1009166

People also ask