Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Correlation between two random signals changes each time it is calculated

I have two signals in MATLAB, say

a = randn(1,1e6);
b = randn(1,1e6);

I am finding the correlation between them as follows:

R=corrcoef(a,b);
r = R(2,1);

Now each time I run my code, the correlation coefficient is different. I even tried to increase the number of samples (from 1e6 to higher values) but that didn't work. Is there some other way to find the correlation coefficient between such signals?

like image 819
rmb Avatar asked Oct 30 '25 10:10

rmb


2 Answers

It seems you are confusing sample correlation coefficient with theoretical correlation coefficient. The former is a random value resulting from the (ramdom) signals generated in the simulation; the latter is a number which is computed from the statistical model of the signal generation process.

What you are computing in your code is the sample correlation coefficient, which depends on the actual signals that are randomly generated (a and b in your code). Those signals are realizations of stochastic processes (white Gaussian processes, in your case, because you use randn).

The theoretical correlation coefficient, on the other hand, is determined by the statistical characterization of the random processes that give rise to your generated signals of the two stochastic processes. So it's not obtained from simulations (as in your code), but computed mathematically.

The theoretical correlation in your case is 0, because the stochastic processes are independent. Note that I know this from the code (from how you generate the signals), not from the actual values the code happens to generate. That's what I mean when I say it's a theoretical value: it's computed from knowledge that you have about how the actual signals are going to be generated.

The sample correlation can be used as and estimation of the theoretical correlation; and that estimation becomes better as the signal size increases. This is the law of large numbers. So, the larger you set the sample size (1e6 in your code), the more concentrated the result (sample correlation coefficient) will be around 0 (theoretical correlation coefficient).

To illustrate this, I have done 10 sets of 1000 simulations, each set of a different sample size. For each sample size I thus collect 1000 different values of the sample correlation coefficient and compute a histogram to see how these values are distributed. The figure confirms that as sample size increases the histograms become narrower (and taller), indicating that the sample correlation coefficient is more concentrated around the theoretical value of 0.

enter image description here


The code used for generating the figure (Matlab R2015b) is:

S = 1e5:1e5:1e6; %// sample sizes
N = 1000; %// number of repetitions to generate histogram
binlimits = [-.015 .015]; %// set manually depending on S
B = 31; %// number of bins in the histogram
stretch = 7; %// stretch factor for plotting the histograms
result = NaN(numel(S),B); %// preallocate
for m = 1:numel(S)
    cc = NaN(1,S(m));
    for n = 1:N
        a = randn(1,S(m));
        b = randn(1,S(m));
        c = corrcoef(a,b);
        cc(n) = c(2,1); %// correlation coefficient
    end
    [hist, edges] = histcounts(cc,31,'BinLimits',binlimits,'Normalization','pdf');
    result(m,:) = hist; %// histogram of correlation coefficient for this sample size
end
bins = (edges(1:end-1) + edges(2:end))/2; %// axis for plotting the histograms
resultbar = NaN(numel(S)*stretch,B);
resultbar(1:stretch:end,:) = result; %// separate the histograms for better visualization
h = bar3(bins, resultbar.'); %'// plot histograms
set(gca,'xtick',1:stretch:numel(h),'xticklabels',S)
delete(h(mod(0:numel(h)-1,stretch)>0)) %// remove zeros
xlabel('Sample correlation coefficient')
ylabel('Sample size')
like image 148
Luis Mendo Avatar answered Nov 02 '25 23:11

Luis Mendo


randn is programmed in such a way that it does not produce the same result per default every time you call it. If you want to generate the same set of random numbers for variables a and b every time you call your script you have to tell Matlab by setting the random generator accordingly. I wrote a small function test with the nested function call_randn to illustrate that. test calls the random generator 3 times and you'll see it generates the same r for all 3 calls. However, any time you call test these numbers will be different.

%// test
function r = test()
    rng('default')  %// Initialise random generator.
    sa = rng;       %// Store current generator settings in sa.
    rng('shuffle')  %// Get new generator settings.
    sb = rng;       %// Store new generator settings in sb.
    n = 10;         %// Number of random numbers to be generated.

    for i = 1:3
        [a(i,1:n),b(i,1:n)] = call_randn(sa,sb,n);
        R=corrcoef(a,b);
        r(i) = R(2,1);
    end
end

function [a,b] = call_randn(sa,sb,n)
    rng(sa);         %// Load generator settings.
    a = randn(1,n);
    rng(sb);         
    b = randn(1,n);
end
like image 41
mabe Avatar answered Nov 03 '25 00:11

mabe