Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating a correlation matrix from a data frame in R

I have a data frame of correlations which looks something like this (although there are ~15,000 rows in my real data)

phen1<-c("A","B","C")
phen2<-c("B","C","A")
cors<-c(0.3,0.7,0.8)

data<-as.data.frame(cbind(phen1, phen2, cors))

    phen1  phen2   cors
1     A      B      0.3
2     B      C      0.7
3     C      A      0.8

This was created externally and read into R and I want to convert this data frame into a correlation matrix with phen1 and 2 as the labels for rows and columns of this matrix. I have only calculated this for either the lower or upper triangle and I don't have the 1's for the Diagnonal. So I would like the end results to be a full correlation matrix but a first step would probably be to create the lower/upper triangle and then convert to a full matrix I think. I'm unsure how to do either step of this.

Also, the results may not be in an intuitive order, but I'm not sure if this matters, but ideally I would like a way to do this which uses the labels in phen1 and phen 2 to make sure the matrix has the correct values in the correct place if that makes sense?

Essentially for this, I would want something like this as an end result:

  A    B    C
A 1    0.3  0.8
B 0.3  1    0.7
C 0.8  0.7  1
like image 526
user5481267 Avatar asked Sep 12 '19 09:09

user5481267


2 Answers

Here is another one in base R where we create a symmetrical dataframe same as data but with columns inverted for phen1 and phen2. Then we use xtabs to get a correlation matrix and set diagonal to 1.

data1 <- data.frame(phen1 = data$phen2, phen2 = data$phen1, cors = data$cors)  
df <- rbind(data, data1)
df1 <- as.data.frame.matrix(xtabs(cors ~ ., df))
diag(df1) <- 1
df1

#    A   B   C
#A 1.0 0.3 0.8
#B 0.3 1.0 0.7
#C 0.8 0.7 1.0

data

phen1<-c("A","B","C")
phen2<-c("B","C","A")
cors<-c(0.3,0.7,0.8)
data<- data.frame(phen1, phen2, cors)
like image 67
Ronak Shah Avatar answered Sep 20 '22 09:09

Ronak Shah


I think there must be an elegant way to do it, however, here is a dplyr and tidyr possibility:

data %>%
 spread(phen1, cors) %>%
 rename(phen = "phen2") %>%
 bind_rows(data %>%
            spread(phen2, cors) %>%
            rename(phen = "phen1")) %>%
 group_by(phen) %>%
 summarise_all(~ ifelse(all(is.na(.)), 1, first(na.omit(.))))

  phen      A     B     C
  <chr> <dbl> <dbl> <dbl>
1 A       1     0.3   0.8
2 B       0.3   1     0.7
3 C       0.8   0.7   1  
like image 26
tmfmnk Avatar answered Sep 19 '22 09:09

tmfmnk