A have a n x m
matrix in which row i
represents the timeseries of the variable V_i
. I would like to compute the n x n
correlation matrix M
, where M_{i,j}
contains the correlation coefficient (Pearson's r) between V_i
and V_j
.
However, when I try the following in numpy:
numpy.corrcoef(numpy.matrix('5 6 7; 1 1 1'))
I get the following output:
array([[ 1., nan],
[ nan, nan]])
It seems that numpy.corrcoef
doesn't like unit vectors, because if I change the second row to 7 6 5
, I get the expected result:
array([[ 1., -1.],
[ -1., 1.]])
What is the reason for this kind of behavior of numpy.corrcoef
?
Interpreting the correlation matrixEach cell in the grid represents the value of the correlation coefficient between two variables. It is a square matrix – each row represents a variable, and all the columns represent the same variables as rows, hence the number of rows = number of columns.
corr() is used to find the pairwise correlation of all columns in the Pandas Dataframe in Python. Any NaN values are automatically excluded. Any non-numeric data type or columns in the Dataframe, it is ignored.
R = corrcoef( A ) returns the matrix of correlation coefficients for A , where the columns of A represent random variables and the rows represent observations. R = corrcoef( A , B ) returns coefficients between two random variables A and B .
leewangzhong (in the comment) is correct, Pearson's r is not defined for constant timeseries, as their standard deviation is zero. Thanks!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With