What is a fast way to compute column by column correlation in matlab

Tags:

I have two very large matrices (60x25000) and I'd like to compute the correlation between the columns only between the two matrices. For example:

corrVal(1) = corr(mat1(:,1), mat2(:,1);
corrVal(2) = corr(mat1(:,2), mat2(:,2);
...
corrVal(i) = corr(mat1(:,i), mat2(:,i);

For smaller matrices I can simply use:

   colCorr = diag( corr( mat1, mat2 ) );

but this doesn't work for very large matrices as I run out of memory. I've considered slicing up the matrices to compute the correlations and then combining the results but it seems like a waste to compute correlation between column combinations that I'm not actually interested.

Is there a quick way to directly compute what I'm interested?

Edit: I've used a loop in the past but its just way to slow:

mat1 = rand(60,5000);
mat2 = rand(60,5000);
nCol = size(mat1,2);
corrVal = zeros(nCol,1);

tic;
for i = 1:nCol
    corrVal(i) = corr(mat1(:,i), mat2(:,i));
end
toc;

This takes ~1 second

tic;
corrVal = diag(corr(mat1,mat2));
toc;

This takes ~0.2 seconds

590

asked Feb 13 '12 15:02

slayton

2 Answers

I can obtain a x100 speed improvement by computing it by hand.

An=bsxfun(@minus,A,mean(A,1)); %%% zero-mean
Bn=bsxfun(@minus,B,mean(B,1)); %%% zero-mean
An=bsxfun(@times,An,1./sqrt(sum(An.^2,1))); %% L2-normalization
Bn=bsxfun(@times,Bn,1./sqrt(sum(Bn.^2,1))); %% L2-normalization
C=sum(An.*Bn,1); %% correlation

You can compare using that code:

A=rand(60,25000);
B=rand(60,25000);

tic;
C=zeros(1,size(A,2));
for i = 1:size(A,2)
    C(i)=corr(A(:,i), B(:,i));
end
toc; 

tic
An=bsxfun(@minus,A,mean(A,1));
Bn=bsxfun(@minus,B,mean(B,1));
An=bsxfun(@times,An,1./sqrt(sum(An.^2,1)));
Bn=bsxfun(@times,Bn,1./sqrt(sum(Bn.^2,1)));
C2=sum(An.*Bn,1);
toc
mean(abs(C-C2)) %% difference between methods

Here are the computing times:

Elapsed time is 10.822766 seconds.
Elapsed time is 0.119731 seconds.

The difference between the two results is very small:

mean(abs(C-C2))

ans =
  3.0968e-17

EDIT: explanation

bsxfun does a column-by-column operation (or row-by-row depending on the input).

An=bsxfun(@minus,A,mean(A,1));

This line will remove (@minus) the mean of each column (mean(A,1)) to each column of A... So basically it makes the columns of A zero-mean.

An=bsxfun(@times,An,1./sqrt(sum(An.^2,1)));

This line multiply (@times) each column by the inverse of its norm. So it makes them L-2 normalized.

Once the columns are zero-mean and L2-normalized, to compute the correlation, you just have to make the dot product of each column of An with each column of B. So you multiply them element-wise An.*Bn, and then you sum each column: sum(An.*Bn);.

answered Oct 18 '22 23:10

Oli

I think the obvious loop might be good enough for your size of problem. On my laptop it takes less than 6 seconds to do the following:

A = rand(60,25000);
B = rand(60,25000);
n = size(A,1);
m = size(A,2);

corrVal = zeros(1,m);
for k=1:m
    corrVal(k) = corr(A(:,k),B(:,k));
end

answered Oct 18 '22 23:10

Ian Hincks

Related questions
                            
                                How can I add a trailing singleton dimension to a matrix
                            
                                Speed in Matlab vs. Julia vs. Fortran
                            
                                How to print an array to a .txt file in Matlab?
                            
                                How can I implement a fisheye lens effect (barrel transformation) in MATLAB?
                            
                                Matlab: Is there a way to get the path of the current script? [duplicate]
                            
                                calculating a function in matlab with very small values
                            
                                precision differences in matlab and c++
                            
                                Why is MATLAB sensitive to order of fields in a struct array assignment?
                            
                                Do I conserve memory in MATLAB by declaring variables global instead of passing them as arguments?
                            
                                How do you print a string in MATLAB in color?
                            
                                Matlab: Free memory is lost after calling a function
                            
                                How to Test if row is in matrix?
                            
                                Using OpenGL in Matlab to get depth buffer
                            
                                scaling the testing data for LIBSVM: MATLAB implementation
                            
                                Converting a .mat file from MATLAB into cv::Mat matrix in OpenCV
                            
                                How can I access all field elements of a structure array nested in a cell array in MATLAB?
                            
                                Matlab Mex library lifecycle
                            
                                Reading date and time from CSV file in MATLAB
                            
                                Write a MAT file without using matlab headers and libraries
                            
                                Force matlab gui to update ui control mid-function

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is a fast way to compute column by column correlation in matlab

Tags:

matrix

matlab

correlation

slayton

People also ask

2 Answers

Oli

Ian Hincks

Recent Activity

Donate For Us