I am attempting to calculate the correlation between all the rows of a large data frame, and so far have come up with a simple for-loop that works. For example: <pre class="prettyprint"><code>name <- c("a", "b", "c", "d") col1 <- c(43.78, 43.84, 37.92, 31.72) col2 <- c(43.80, 43.40, 37.64, 31.62) col3 <- c(43.14, 42.85, 37.54, 31.74) df <- data.frame(name, col1, col2, col3) cor.df <- data.frame(name1=NA, name2=NA,correl=NA) for(i in 1: (nrow(df) - 1)) { for(j in (i+1): nrow(df) ) { v1 <- as.numeric( df[i, 2:ncol(df)] ) v2 <- as.numeric( df[j, 2:ncol(df)] ) correl <- cor(v1, v2) name1 <- df[i, "name"] name2 <- df[j, "name"] dftemp <- data.frame(name1, name2, correl) cor.df <- rbind(cor.df, dftemp) } } na.omit(cor.df) # name1 name2 correl # a b 0.8841255 # a c 0.6842705 # a d -0.6491118 # b c 0.9457125 # b d -0.2184630 # c d 0.1105508 </code></pre> Given the large data frame and the inefficient for-loop, the correlation computation takes a long time. Would anyone have any suggestions as to how to make it faster? Note that I have many data frames in a list, so I can use lapply (but have not figured out how to write the line of code)

Drop the first column, transpose and use base::cor function: <pre class="prettyprint"><code>> cor(t(df[-1])) [,1] [,2] [,3] [,4] [1,] 1.0000000 0.8841255 0.6842705 -0.6491118 [2,] 0.8841255 1.0000000 0.9457125 -0.2184630 [3,] 0.6842705 0.9457125 1.0000000 0.1105508 [4,] -0.6491118 -0.2184630 0.1105508 1.0000000 </code></pre> <hr> <pre class="prettyprint"><code># pretty output x <- cor(t(df[, -1])) x[upper.tri(x, diag = TRUE)] <- NA rownames(x) <- colnames(x) <- df$name x <- na.omit(reshape::melt(t(x))) x <- x[ order(x$X1, x$X2), ] x # X1 X2 value # 5 a b 0.8841255 # 9 a c 0.6842705 # 13 a d -0.6491118 # 10 b c 0.9457125 # 14 b d -0.2184630 # 15 c d 0.1105508 </code></pre>

correlation by row, within data frame

Tags:

list

dataframe

r

rows

correlation

I am attempting to calculate the correlation between all the rows of a large data frame, and so far have come up with a simple for-loop that works. For example:

name <- c("a", "b", "c", "d")
col1 <- c(43.78, 43.84, 37.92, 31.72)
col2 <- c(43.80, 43.40, 37.64, 31.62)
col3 <- c(43.14, 42.85, 37.54, 31.74)
df <- data.frame(name, col1, col2, col3)
cor.df <- data.frame(name1=NA, name2=NA,correl=NA)

for(i in 1: (nrow(df) - 1))  {
  for(j in (i+1): nrow(df) ) {
    v1 <- as.numeric( df[i, 2:ncol(df)] )
    v2 <- as.numeric( df[j, 2:ncol(df)] )
    correl <- cor(v1, v2)

    name1 <- df[i, "name"]
    name2 <- df[j, "name"]

    dftemp <- data.frame(name1, name2, correl)
    cor.df <- rbind(cor.df, dftemp)
   }
}

na.omit(cor.df)

#    name1 name2     correl
#     a     b      0.8841255
#     a     c      0.6842705
#     a     d     -0.6491118
#     b     c      0.9457125
#     b     d     -0.2184630
#     c     d      0.1105508

Given the large data frame and the inefficient for-loop, the correlation computation takes a long time. Would anyone have any suggestions as to how to make it faster? Note that I have many data frames in a list, so I can use lapply (but have not figured out how to write the line of code)

475

asked Oct 30 '17 13:10

fragf

1 Answers

Drop the first column, transpose and use base::cor function:

> cor(t(df[-1]))
           [,1]       [,2]      [,3]       [,4]
[1,]  1.0000000  0.8841255 0.6842705 -0.6491118
[2,]  0.8841255  1.0000000 0.9457125 -0.2184630
[3,]  0.6842705  0.9457125 1.0000000  0.1105508
[4,] -0.6491118 -0.2184630 0.1105508  1.0000000

# pretty output
x <- cor(t(df[, -1]))
x[upper.tri(x, diag = TRUE)] <- NA
rownames(x) <- colnames(x) <- df$name
x <- na.omit(reshape::melt(t(x)))
x <- x[ order(x$X1, x$X2), ]

x
#    X1 X2      value
# 5   a  b  0.8841255
# 9   a  c  0.6842705
# 13  a  d -0.6491118
# 10  b  c  0.9457125
# 14  b  d -0.2184630
# 15  c  d  0.1105508

107

answered Sep 19 '22 17:09

amrrs

Related questions
                            
                                Regex named groups in R
                            
                                Drop unused levels from a factor after filtering data frame using dplyr
                            
                                How to save and edit the content of a kable print?
                            
                                Plotly color factor data stopped working - error package scale
                            
                                ggplot2 axis: how to combine scale_x_reverse with scale_x_continous
                            
                                Where is convert in ImageMagick?
                            
                                Create a matrix of residual plots using purrr and ggplot
                            
                                R missing bib key in citation() output
                            
                                How to change axis features in plotly?
                            
                                data.table equivalent of complete/fill from tidyr
                            
                                Create efficient week over week calculation with subsetting
                            
                                How to assign same color to factors across plots in a nested loop for ggplot?
                            
                                Sampling from a given probability distribution using R
                            
                                Find difference between grouped values in dplyr
                            
                                Group_by then filter with dplyr
                            
                                Sparklyr using case_when with variables
                            
                                piping with dot inside dplyr::filter
                            
                                R: Reorder factor levels with data table (for use with Plotly)
                            
                                Disable action button when textinput is empty in Shiny app [R]
                            
                                blogdown deployment newbie issue

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With