Finding row/column names from a correlation matrix values

Tags:

I have a correlation matrix, that contains stock price correlations. it was calculated via:

corMatrix <- cor(cl2014, use="pairwise.complete.obs")

The matrix is much bigger but looks like this:

> corMatrix
             RY.TO.Close CM.TO.Close BNS.TO.Close TD.TO.Close
RY.TO.Close    1.0000000   0.8990782    0.8700985  -0.2505789
CM.TO.Close    0.8990782   1.0000000    0.8240780  -0.4184085
BNS.TO.Close   0.8700985   0.8240780    1.0000000  -0.2141785
TD.TO.Close   -0.2505789  -0.4184085   -0.2141785   1.0000000

> class(corMatrix)
[1] "matrix"

I'm trying to determine how I can get the row and column names of all values in the matrix that have a correlation greater than some value.

I can index the matrix to generate an index matrix like so:

workingset <- corMatrix > 0.85

What I really want is just a list of row/col pairs identified by the row and column name so I know what pairs to do further exploration on.

How can I go from the indexing grid to the row/column names?

I'd ideally also only examine only the lower or upper portion of the matrix as to not generate duplicate values and of course the main diagonal can be ignored as it will always be 1.

673

asked Oct 31 '14 02:10

chollida

Video Answer

1 Answers

Another option is to use melt from "reshape2" and subset:

library(reshape2)
subset(melt(corMatrix), value > .85)
#            Var1         Var2     value
# 1   RY.TO.Close  RY.TO.Close 1.0000000
# 2   CM.TO.Close  RY.TO.Close 0.8990782
# 3  BNS.TO.Close  RY.TO.Close 0.8700985
# 5   RY.TO.Close  CM.TO.Close 0.8990782
# 6   CM.TO.Close  CM.TO.Close 1.0000000
# 9   RY.TO.Close BNS.TO.Close 0.8700985
# 11 BNS.TO.Close BNS.TO.Close 1.0000000
# 16  TD.TO.Close  TD.TO.Close 1.0000000

You would need to do melt(as.matrix(corMatrix)) if your dataset is a data.frame since there are different melt methods for matrices and data.frames.

Update

As you mention you're only interested in the values from the upper triangle (to avoid duplicate pairs/values) and excluding the diagonal, you can do the following:

CM <- corMatrix                               # Make a copy of your matrix
CM[lower.tri(CM, diag = TRUE)] <- NA          # lower tri and diag set to NA
subset(melt(CM, na.rm = TRUE), value > .85)   # melt and subset as before
#          Var1         Var2     value
# 5 RY.TO.Close  CM.TO.Close 0.8990782
# 9 RY.TO.Close BNS.TO.Close 0.8700985

You could also do this with base R. Continuing with "CM" from above, try:

subset(na.omit(data.frame(expand.grid(dimnames(CM)), value = c(CM))), value > .85)
#          Var1         Var2     value
# 5 RY.TO.Close  CM.TO.Close 0.8990782
# 9 RY.TO.Close BNS.TO.Close 0.8700985

answered Nov 08 '22 20:11

A5C1D2H2I1M1N2O1R2T1

Related questions
                            
                                Count Pattern Matching in R
                            
                                R package Kohonen - how to plot hexagons instead of circles as in Matlab SOM toolbox?
                            
                                cforest prints empty tree
                            
                                Nested lapply() in a list?
                            
                                How to custom a model in CARET to perform PLS-[Classifer] two-step classificaton model?
                            
                                ggplot2 box-whisker plot: show 95% confidence intervals & remove outliers
                            
                                conditionally remove elements in a vector
                            
                                Cannot use `fill = NA` in cast
                            
                                More than one value for "each" argument in "rep" function?
                            
                                Underline Text in a barplot in R
                            
                                In R, how can I generate a subgraph from a igraph object based on multiple attribute scores?
                            
                                Plot a best fit line R [duplicate]
                            
                                create plots based on radio button selection R Shiny
                            
                                Converting object of class rules to data frame in R
                            
                                How to count occurrences combinations in data.table in R
                            
                                How to plot mean and standard error in Boxplot in R
                            
                                R- sqldf -need explicit units for numeric conversion
                            
                                Calculate summary statistics (e.g. mean) on all numeric columns using data.table
                            
                                Can I programmatically update the type of a set of columns (to factors) in data.table?
                            
                                Aggregation and percentage calculation by groups

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Finding row/column names from a correlation matrix values

Tags:

r

matrix

correlation