I have a correlation matrix, that contains stock price correlations. it was calculated via:
corMatrix <- cor(cl2014, use="pairwise.complete.obs")
The matrix is much bigger but looks like this:
> corMatrix
RY.TO.Close CM.TO.Close BNS.TO.Close TD.TO.Close
RY.TO.Close 1.0000000 0.8990782 0.8700985 -0.2505789
CM.TO.Close 0.8990782 1.0000000 0.8240780 -0.4184085
BNS.TO.Close 0.8700985 0.8240780 1.0000000 -0.2141785
TD.TO.Close -0.2505789 -0.4184085 -0.2141785 1.0000000
> class(corMatrix)
[1] "matrix"
I'm trying to determine how I can get the row and column names of all values in the matrix that have a correlation greater than some value.
I can index the matrix to generate an index matrix like so:
workingset <- corMatrix > 0.85
What I really want is just a list of row/col pairs identified by the row and column name so I know what pairs to do further exploration on.
How can I go from the indexing grid to the row/column names?
I'd ideally also only examine only the lower or upper portion of the matrix as to not generate duplicate values and of course the main diagonal can be ignored as it will always be 1.
Naming Rows and Columns of a Matrix in R Programming – rownames() and colnames() Function. rownames() function in R Language is used to set the names to rows of a matrix.
Interpreting the correlation matrix Each cell in the grid represents the value of the correlation coefficient between two variables. The value at position (a, b) represents the correlation coefficient between features at row a and column b. This will be equal to the value at position (b, a)
We use rownames() function for renaming the matrix row in R. It is quite simple to use rownames() function. If you want to know more about rownames() function then you can get help about it in R studio using the command help(rownames) or ? rownames().
We use colnames() function for renaming the matrix column in R. It is quite simple to use the colnames() function. If you want to know more about colnames() function, then you can get help about it in R Studio using the command help(colnames) or ? colnames().
The matrix depicts the correlation between all the possible pairs of values in a table. It is a powerful tool to summarize a large dataset and to identify and visualize patterns in the given data. A correlation matrix consists of rows and columns that show the variables. Each cell in a table contains the correlation coefficient.
As you can see based on the previously shown RStudio console output, our example matrix has three rows and five columns. The rows of our matrix are named Row1 – Row3 and the variables are named Col1 – Col5.
Extracting values from matrix by row names A column subset matrix can be extracted from the original matrix using a filter for the selected column names. Since a matrix’s elements are accessed in a dual index format, particular row selection can be carried out.
The rows of our matrix are named Row1 – Row3 and the variables are named Col1 – Col5. Let’s extract some values of our matrix!
Another option is to use melt
from "reshape2" and subset
:
library(reshape2)
subset(melt(corMatrix), value > .85)
# Var1 Var2 value
# 1 RY.TO.Close RY.TO.Close 1.0000000
# 2 CM.TO.Close RY.TO.Close 0.8990782
# 3 BNS.TO.Close RY.TO.Close 0.8700985
# 5 RY.TO.Close CM.TO.Close 0.8990782
# 6 CM.TO.Close CM.TO.Close 1.0000000
# 9 RY.TO.Close BNS.TO.Close 0.8700985
# 11 BNS.TO.Close BNS.TO.Close 1.0000000
# 16 TD.TO.Close TD.TO.Close 1.0000000
You would need to do melt(as.matrix(corMatrix))
if your dataset is a data.frame
since there are different melt
methods for matrices and data.frame
s.
As you mention you're only interested in the values from the upper triangle (to avoid duplicate pairs/values) and excluding the diagonal, you can do the following:
CM <- corMatrix # Make a copy of your matrix
CM[lower.tri(CM, diag = TRUE)] <- NA # lower tri and diag set to NA
subset(melt(CM, na.rm = TRUE), value > .85) # melt and subset as before
# Var1 Var2 value
# 5 RY.TO.Close CM.TO.Close 0.8990782
# 9 RY.TO.Close BNS.TO.Close 0.8700985
You could also do this with base R. Continuing with "CM"
from above, try:
subset(na.omit(data.frame(expand.grid(dimnames(CM)), value = c(CM))), value > .85)
# Var1 Var2 value
# 5 RY.TO.Close CM.TO.Close 0.8990782
# 9 RY.TO.Close BNS.TO.Close 0.8700985
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With