I have a large matrix of correlations (1093 x 1093). I'm trying my matrix into a dataframe that has a column for every row and column pair, so it would (1093)^2 records.
Here's a snippet of my matrix
60516 45264 02117
60516 1.00000000 -0.370793012 -0.082897941
45264 -0.37079301 1.000000000 0.005145601
02117 -0.08289794 0.005145601 1.000000000
The goal from here would be to have a dataframe that looks like this:
row column correlation
60516 60516 1.000000000
60516 45264 -0.370793012
........ and so on.
Anyone have any tips? Let me know if I can clarify anything
Thanks, Ben
With a bit of tidyverse is easy:
given a correlation matrix X
:
X %>% as.data.frame %>% tibble::rownames_to_column() %>%
tidyr::pivot_longer(-rowname)
You can, of course, change the names using rownames_to_column
and pivot_longer
arguments, and can also add filter(rowname != name)
to remove the diagonal correlations.
For matrix m
, you could do:
data.frame(row=rownames(m)[row(m)], col=colnames(m)[col(m)], corr=c(m))
# row col corr
# 1 60516 60516 1.000000000
# 2 45264 60516 -0.370793010
# 3 02117 60516 -0.082897940
# 4 60516 45264 -0.370793012
# 5 45264 45264 1.000000000
# 6 02117 45264 0.005145601
# 7 60516 02117 -0.082897941
# 8 45264 02117 0.005145601
# 9 02117 02117 1.000000000
But if your matrix is symmetrical and if you are not interested in the diagonal, then you can simplify it to:
data.frame(row=rownames(m)[row(m)[upper.tri(m)]],
col=colnames(m)[col(m)[upper.tri(m)]],
corr=m[upper.tri(m)])
# row col corr
# 1 60516 45264 -0.370793012
# 2 60516 02117 -0.082897941
# 3 45264 02117 0.005145601
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With