Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Transform Correlation Matrix into dataframe with records for each row column pair

I have a large matrix of correlations (1093 x 1093). I'm trying my matrix into a dataframe that has a column for every row and column pair, so it would (1093)^2 records.

Here's a snippet of my matrix

            60516        45264        02117
60516  1.00000000 -0.370793012 -0.082897941
45264 -0.37079301  1.000000000  0.005145601
02117 -0.08289794  0.005145601  1.000000000

The goal from here would be to have a dataframe that looks like this:

row column correlation
60516 60516 1.000000000
60516 45264 -0.370793012

........ and so on.

Anyone have any tips? Let me know if I can clarify anything

Thanks, Ben

like image 795
ben890 Avatar asked Jan 19 '15 23:01

ben890


2 Answers

With a bit of tidyverse is easy:

given a correlation matrix X:

X %>% as.data.frame %>% tibble::rownames_to_column() %>% 
    tidyr::pivot_longer(-rowname)

You can, of course, change the names using rownames_to_column and pivot_longer arguments, and can also add filter(rowname != name) to remove the diagonal correlations.

like image 109
Bakaburg Avatar answered Oct 23 '22 00:10

Bakaburg


For matrix m, you could do:

data.frame(row=rownames(m)[row(m)], col=colnames(m)[col(m)], corr=c(m))

#     row   col         corr
# 1 60516 60516  1.000000000
# 2 45264 60516 -0.370793010
# 3 02117 60516 -0.082897940
# 4 60516 45264 -0.370793012
# 5 45264 45264  1.000000000
# 6 02117 45264  0.005145601
# 7 60516 02117 -0.082897941
# 8 45264 02117  0.005145601
# 9 02117 02117  1.000000000

But if your matrix is symmetrical and if you are not interested in the diagonal, then you can simplify it to:

data.frame(row=rownames(m)[row(m)[upper.tri(m)]], 
           col=colnames(m)[col(m)[upper.tri(m)]], 
           corr=m[upper.tri(m)])

#     row   col         corr
# 1 60516 45264 -0.370793012
# 2 60516 02117 -0.082897941
# 3 45264 02117  0.005145601
like image 24
jbaums Avatar answered Oct 22 '22 23:10

jbaums