I am trying to port a MATLAB/Octave program to Python using NumPy 1.8.0 and Python 2.7.3. I've used this reference as help in converting MATLAB functions to NumPy methods with great success, until I get to the point where I want to compute the correlation between two matrices.
The first matrix is 40000x25 floats, the second matrix is 40000x1 ints. In Octave I use the statement corr(a,b)
and get a 25x1 matrix of floats. Trying the corresponding method in NumPy (numpy.correlate(a,b)
) produces an error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Python/2.7/site-packages/numpy-1.8.0.dev_1a9aa5a_20130415-py2.7-macosx-10.8-intel.egg/numpy/core/numeric.py", line 751, in correlate
return multiarray.correlate2(a,v,mode)
ValueError: object too deep for desired array
I can get it to work if I change the code to calculate a correlation for each column of a
, like so:
for i in range(25):
c2[i] = numpy.correlate(a[:,i], b)
However, the values in the c2
array are different than the output from Octave. Octave returns a 25x1 matrix of floats all less than 1. The values I get from NumPy are floats between -270 and 900.
I have tried to understand what the two algorithms are doing under the hood but have failed miserably. Can someone point out my logic failure?
Description. rho = corr( X ) returns a matrix of the pairwise linear correlation coefficient between each pair of columns in the input matrix X . rho = corr( X , Y ) returns a matrix of the pairwise correlation coefficient between each pair of columns in the input matrices X and Y .
numpy. correlate simply returns the cross-correlation of two vectors. if you need to understand cross-correlation, then start with http://en.wikipedia.org/wiki/Cross-correlation. This will return a comb/shah function with a maximum when both data sets are overlapping.
R = corrcoef( A ) returns the matrix of correlation coefficients for A , where the columns of A represent random variables and the rows represent observations. R = corrcoef( A , B ) returns coefficients between two random variables A and B .
It appears that there exists a numpy.corrcoef
which computes the correlation coefficients, as desired. However, its interface is different from the Octave/Matlab corr
.
First of all, by default, the function treats rows as variables, with the columns being observations. To mimic the behavior of Octave/Matlab, you can pass a flag which reverses this.
Also, according to this answer, the numpy.cov
function (which corrcoef
uses internally, I assume) returns a 2x2 matrix, each of which contain a specific covariance:
cov(a,a) cov(a,b)
cov(a,b) cov(b,b)
As he points out, the [0][1]
element is what you'd want for cov(a,b)
. Thus, perhaps something like this will work:
for i in range(25):
c2[i] = numpy.corrcoef(a[:,i], b, rowvar=0)[0][1]
For reference, here are some excerpts of the two functions that you had tried. It seems to be that they perform completely different things.
Octave:
— Function File: corr (x, y)
Compute matrix of correlation coefficients.
If each row of x and y is an observation and each column is a variable, then the (i, j)-th entry of corr (x, y) is the correlation between the i-th variable in x and the j-th variable in y.
corr (x,y) = cov (x,y) / (std (x) * std (y))
If called with one argument, compute corr (x, x), the correlation between the columns of x.
And Numpy:
numpy.correlate(a, v, mode='valid', old_behavior=False)[source]
Cross-correlation of two 1-dimensional sequences.
This function computes the correlation as generally defined in signal processing texts:
z[k] = sum_n a[n] * conj(v[n+k])
with a and v sequences being zero-padded where necessary and conj being the conjugate.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With