Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generating correlated numbers

Here is a fun one: I need to generate random x/y pairs that are correlated at a given value of Pearson product moment correlation coefficient, or Pearson r. You can imagine this as two arrays, array X and array Y, where the values of array X and array Y must be re-generated, re-ordered or transformed until they are correlated with each other at a given level of Pearson r. Here is the kicker: Array X and Array Y must be uniform distributions.

I can do this with a normal distribution, but transforming the values without skewing the distribution has me stumped. I tried re-ordering the values in the arrays to increase the correlation, but I will never get arrays correlated at 1.00 or -1.00 just by sorting.

Any ideas?

--

here is the AS3 code for random correlated gaussians, to get the wheels turning:

public static function nextCorrelatedGaussians(r:Number):Array{             
         var d1:Number;
         var d2:Number;
         var n1:Number;
         var n2:Number;
         var lambda:Number;
         var r:Number;
         var arr:Array = new Array();
         var isNeg:Boolean; 

        if (r<0){
            r *= -1;
              isNeg=true;
        }            
        lambda= (   (r*r)  -  Math.sqrt(  (r*r) - (r*r*r*r)  )     )   /   ((  2*r*r ) - 1  );

        n1 = nextGaussian();
        n2 = nextGaussian();           
        d1 = n1;            
        d2 = ((lambda*n1) + ((1-lambda)*n2)) / Math.sqrt( (lambda*lambda) + (1-lambda)*(1-lambda));

        if (isNeg) {d2*= -1}           
        arr.push(d1);
        arr.push(d2);
        return arr;
    }
like image 327
Gideon Avatar asked Nov 11 '09 20:11

Gideon


People also ask

How do you generate a correlated random number in Excel?

Here's how. Create a random series of values in column A and another random series from the same distribution in column B. and drag down to form the third series. The resulting correlation coefficient betwen the first and third series will be very close to the desired correlation.

Can random variables be correlated?

If the random variables are correlated then this should yield a better result, on the average, than just guessing. We are encouraged to select a linear rule when we note that the sample points tend to fall about a sloping line.


2 Answers

Here is an implementation of of twolfe18's algorithm written in Actionscript 3:

for (var j:int=0; j < size; j++) {
 xValues[i]=Math.random());
}
var varX:Number = Util.variance(xValues);
var varianceE:Number = 1/(r*varX) - varX;

for (var i:int=0; i < size; i++) {
 yValues[i] = xValues[i] + boxMuller(0, Math.sqrt(varianceE));
}

boxMuller is just a method that generates a random Gaussian with the arguments (mean, stdDev). size is the size of the distribution.

Sample output

Target p: 0.8
Generated p: 0.04846346291280387
variance of x distribution: 0.0707786253165176
varianceE: 17.589920412141158

As you can see I'm still a ways off. Any suggestions?

like image 37
Gideon Avatar answered Oct 17 '22 16:10

Gideon


I ended up writing a short paper on this

It doesn't include your sorting method (although in practice I think it's similar to my first method, in a roundabout way), but does describe two ways that don't require iteration.

like image 166
andrew cooke Avatar answered Oct 17 '22 15:10

andrew cooke