Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generating random correlated x and y points using Numpy

I'd like to generate correlated arrays of x and y coordinates, in order to test various matplotlib plotting approaches, but I'm failing somewhere, because I can't get numpy.random.multivariate_normal to give me the samples I want. Ideally, I want my x values between -0.51, and 51.2, and my y values between 0.33 and 51.6 (though I suppose equal ranges would be OK, since I can constrain the plot afterwards), but I'm not sure what mean (0, 0?) and covariance values I should be using to get these samples from the function.

like image 306
urschrei Avatar asked Sep 08 '13 12:09

urschrei


People also ask

How do you generate a correlated random data?

To generate correlated normally distributed random samples, one can first generate uncorrelated samples, and then multiply them by a matrix C such that CCT=R, where R is the desired covariance matrix. C can be created, for example, by using the Cholesky decomposition of R, or from the eigenvalues and eigenvectors of R.

How do you calculate Numpy correlation?

The Pearson Correlation coefficient can be computed in Python using corrcoef() method from Numpy. The input for this function is typically a matrix, say of size mxn , where: Each column represents the values of a random variable. Each row represents a single sample of n random variables.

Can Numpy generate random numbers?

Generate Random NumberNumPy offers the random module to work with random numbers.

Can random variables be correlated?

If the random variables are correlated then this should yield a better result, on the average, than just guessing. We are encouraged to select a linear rule when we note that the sample points tend to fall about a sloping line.


1 Answers

As the name implies numpy.random.multivariate_normal generates normal distributions, this means that there is a non-null probability of finding points outside of any given interval. You can generate correlated uniform distributions but this a little more convoluted. Take a look here for two possible methods.

If you want to go with the normal distribution you can set up the sigmas so that your half-interval correspond to 3 standard deviations (you can also filter out the bad points if needed). In this way you will have ~99% of your points inside your interval, ex:

import numpy as np
from matplotlib.pyplot import scatter

xx = np.array([-0.51, 51.2])
yy = np.array([0.33, 51.6])
means = [xx.mean(), yy.mean()]  
stds = [xx.std() / 3, yy.std() / 3]
corr = 0.8         # correlation
covs = [[stds[0]**2          , stds[0]*stds[1]*corr], 
        [stds[0]*stds[1]*corr,           stds[1]**2]] 

m = np.random.multivariate_normal(means, covs, 1000).T
scatter(m[0], m[1])

enter image description here

like image 188
elyase Avatar answered Sep 23 '22 16:09

elyase