A have a n x m matrix in which row i represents the timeseries of the variable V_i. I would like to compute the n x n correlation matrix M, where M_{i,j} contains the correlation coefficient (Pearson's r) between V_i and V_j.
However, when I try the following in numpy:
numpy.corrcoef(numpy.matrix('5 6 7; 1 1 1'))
I get the following output:
array([[ 1., nan],
[ nan, nan]])
It seems that numpy.corrcoef doesn't like unit vectors, because if I change the second row to 7 6 5, I get the expected result:
array([[ 1., -1.],
[ -1., 1.]])
What is the reason for this kind of behavior of numpy.corrcoef?
Interpreting the correlation matrixEach cell in the grid represents the value of the correlation coefficient between two variables. It is a square matrix – each row represents a variable, and all the columns represent the same variables as rows, hence the number of rows = number of columns.
corr() is used to find the pairwise correlation of all columns in the Pandas Dataframe in Python. Any NaN values are automatically excluded. Any non-numeric data type or columns in the Dataframe, it is ignored.
R = corrcoef( A ) returns the matrix of correlation coefficients for A , where the columns of A represent random variables and the rows represent observations. R = corrcoef( A , B ) returns coefficients between two random variables A and B .
leewangzhong (in the comment) is correct, Pearson's r is not defined for constant timeseries, as their standard deviation is zero. Thanks!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With