Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Correlation matrix in NumPy with NaN's

A have a n x m matrix in which row i represents the timeseries of the variable V_i. I would like to compute the n x n correlation matrix M, where M_{i,j} contains the correlation coefficient (Pearson's r) between V_i and V_j.

However, when I try the following in numpy:

numpy.corrcoef(numpy.matrix('5 6 7; 1 1 1'))

I get the following output:

array([[  1., nan],
       [ nan, nan]])

It seems that numpy.corrcoef doesn't like unit vectors, because if I change the second row to 7 6 5, I get the expected result:

array([[  1., -1.],
       [ -1.,  1.]])

What is the reason for this kind of behavior of numpy.corrcoef?

like image 889
John Manak Avatar asked Dec 05 '13 14:12

John Manak


People also ask

How do you analyze a correlation matrix in Python?

Interpreting the correlation matrixEach cell in the grid represents the value of the correlation coefficient between two variables. It is a square matrix – each row represents a variable, and all the columns represent the same variables as rows, hence the number of rows = number of columns.

Does pandas Corr ignore NaN?

corr() is used to find the pairwise correlation of all columns in the Pandas Dataframe in Python. Any NaN values are automatically excluded. Any non-numeric data type or columns in the Dataframe, it is ignored.

How do you find the correlation coefficient in Matlab?

R = corrcoef( A ) returns the matrix of correlation coefficients for A , where the columns of A represent random variables and the rows represent observations. R = corrcoef( A , B ) returns coefficients between two random variables A and B .


1 Answers

leewangzhong (in the comment) is correct, Pearson's r is not defined for constant timeseries, as their standard deviation is zero. Thanks!

like image 107
John Manak Avatar answered Oct 28 '22 15:10

John Manak