Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding row/column names from a correlation matrix values

I have a correlation matrix, that contains stock price correlations. it was calculated via:

corMatrix <- cor(cl2014, use="pairwise.complete.obs")

The matrix is much bigger but looks like this:

> corMatrix
             RY.TO.Close CM.TO.Close BNS.TO.Close TD.TO.Close
RY.TO.Close    1.0000000   0.8990782    0.8700985  -0.2505789
CM.TO.Close    0.8990782   1.0000000    0.8240780  -0.4184085
BNS.TO.Close   0.8700985   0.8240780    1.0000000  -0.2141785
TD.TO.Close   -0.2505789  -0.4184085   -0.2141785   1.0000000

> class(corMatrix)
[1] "matrix"

I'm trying to determine how I can get the row and column names of all values in the matrix that have a correlation greater than some value.

I can index the matrix to generate an index matrix like so:

workingset <- corMatrix > 0.85

What I really want is just a list of row/col pairs identified by the row and column name so I know what pairs to do further exploration on.

How can I go from the indexing grid to the row/column names?

I'd ideally also only examine only the lower or upper portion of the matrix as to not generate duplicate values and of course the main diagonal can be ignored as it will always be 1.

like image 673
chollida Avatar asked Oct 31 '14 02:10

chollida


People also ask

How do you name rows and columns in a matrix?

Naming Rows and Columns of a Matrix in R Programming – rownames() and colnames() Function. rownames() function in R Language is used to set the names to rows of a matrix.

What does correlation matrix in Python tell you?

Interpreting the correlation matrix Each cell in the grid represents the value of the correlation coefficient between two variables. The value at position (a, b) represents the correlation coefficient between features at row a and column b. This will be equal to the value at position (b, a)

Can matrix have row names in R?

We use rownames() function for renaming the matrix row in R. It is quite simple to use rownames() function. If you want to know more about rownames() function then you can get help about it in R studio using the command help(rownames) or ? rownames().

Can a matrix have column names in R?

We use colnames() function for renaming the matrix column in R. It is quite simple to use the colnames() function. If you want to know more about colnames() function, then you can get help about it in R Studio using the command help(colnames) or ? colnames().

What is a correlation matrix?

The matrix depicts the correlation between all the possible pairs of values in a table. It is a powerful tool to summarize a large dataset and to identify and visualize patterns in the given data. A correlation matrix consists of rows and columns that show the variables. Each cell in a table contains the correlation coefficient.

How many rows and columns are there in a matrix?

As you can see based on the previously shown RStudio console output, our example matrix has three rows and five columns. The rows of our matrix are named Row1 – Row3 and the variables are named Col1 – Col5.

How do you extract values from a matrix by row names?

Extracting values from matrix by row names A column subset matrix can be extracted from the original matrix using a filter for the selected column names. Since a matrix’s elements are accessed in a dual index format, particular row selection can be carried out.

What is the name of the row in the matrix?

The rows of our matrix are named Row1 – Row3 and the variables are named Col1 – Col5. Let’s extract some values of our matrix!


Video Answer


1 Answers

Another option is to use melt from "reshape2" and subset:

library(reshape2)
subset(melt(corMatrix), value > .85)
#            Var1         Var2     value
# 1   RY.TO.Close  RY.TO.Close 1.0000000
# 2   CM.TO.Close  RY.TO.Close 0.8990782
# 3  BNS.TO.Close  RY.TO.Close 0.8700985
# 5   RY.TO.Close  CM.TO.Close 0.8990782
# 6   CM.TO.Close  CM.TO.Close 1.0000000
# 9   RY.TO.Close BNS.TO.Close 0.8700985
# 11 BNS.TO.Close BNS.TO.Close 1.0000000
# 16  TD.TO.Close  TD.TO.Close 1.0000000

You would need to do melt(as.matrix(corMatrix)) if your dataset is a data.frame since there are different melt methods for matrices and data.frames.


Update

As you mention you're only interested in the values from the upper triangle (to avoid duplicate pairs/values) and excluding the diagonal, you can do the following:

CM <- corMatrix                               # Make a copy of your matrix
CM[lower.tri(CM, diag = TRUE)] <- NA          # lower tri and diag set to NA
subset(melt(CM, na.rm = TRUE), value > .85)   # melt and subset as before
#          Var1         Var2     value
# 5 RY.TO.Close  CM.TO.Close 0.8990782
# 9 RY.TO.Close BNS.TO.Close 0.8700985

You could also do this with base R. Continuing with "CM" from above, try:

subset(na.omit(data.frame(expand.grid(dimnames(CM)), value = c(CM))), value > .85)
#          Var1         Var2     value
# 5 RY.TO.Close  CM.TO.Close 0.8990782
# 9 RY.TO.Close BNS.TO.Close 0.8700985
like image 67
A5C1D2H2I1M1N2O1R2T1 Avatar answered Nov 08 '22 20:11

A5C1D2H2I1M1N2O1R2T1