Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

scipy - generate random variables with correlations

I'm working to implement a basic Monte Carlo simulator in Python for some project management risk modeling I'm trying to do (basically Crystal Ball / @Risk, but in Python).

I have a set of n random variables (all scipy.stats instances). I know that I can use rv.rvs(size=k) to generate k independent observations from each of these n variables.

I'd like to introduce correlations among the variables by specifying an n x n positive semi-definite correlation matrix.

Is there a clean way to do this in scipy?

What I've Tried

This answer and this answer seem to indicate that "copulas" would be an answer, but I don't see any reference in scipy to them.

This link seems to implement what I'm looking for, but I'm not sure if scipy has this functionality implemented already. I'd also like it to work for non-normal variables.

It seems that the Iman, Conover paper is the standard method.

like image 625
MikeRand Avatar asked Jan 01 '15 01:01

MikeRand


People also ask

How do you construct a random variable with a correlation?

To generate correlated normally distributed random samples, one can first generate uncorrelated samples, and then multiply them by a matrix C such that CCT=R, where R is the desired covariance matrix. C can be created, for example, by using the Cholesky decomposition of R, or from the eigenvalues and eigenvectors of R.

Can random variables be correlated?

If the random variables are correlated then this should yield a better result, on the average, than just guessing. We are encouraged to select a linear rule when we note that the sample points tend to fall about a sloping line.

How do you make a correlation matrix in Python?

Method 1: Creating a correlation matrix using Numpy libraryNumpy library make use of corrcoef() function that returns a matrix of 2×2. The matrix consists of correlations of x with x (0,0), x with y (0,1), y with x (1,0) and y with y (1,1).


2 Answers

If you just want correlation through a Gaussian Copula (*), then it can be calculated in a few steps with numpy and scipy.

  • create multivariate random variables with desired covariance, numpy.random.multivariate_normal, and creating a (nobs by k_variables) array

  • apply scipy.stats.norm.cdf to transform normal to uniform random variables, for each column/variable to get uniform marginal distributions

  • apply dist.ppf to transform uniform margin to the desired distribution, where dist can be one of the distributions in scipy.stats

(*) Gaussian copula is only one choice and it is not the best when we are interested in tail behavior, but it is the easiest to work with for example http://archive.wired.com/techbiz/it/magazine/17-03/wp_quant?currentPage=all

two references

https://stats.stackexchange.com/questions/37424/how-to-simulate-from-a-gaussian-copula

http://www.mathworks.com/products/demos/statistics/copulademo.html

(I might have done this a while ago in python, but don't have any scripts or function right now.)

like image 70
Josef Avatar answered Oct 21 '22 05:10

Josef


It seems like a rejection-based sampling method such as the Metropolis-Hastings algorithm is what you want. Scipy can implement such methods with its scipy.optimize.basinhopping function.

Rejection-based sampling methods allow you to draw samples from any given probability distribution. The idea is that you draw random samples from another "proposal" pdf that is easy to sample from (such as uniform or gaussian distributions) and then use a random test to decide if this sample from the proposal distribution should be "accepted" as representing a sample of the desired distribution.

The remaining tricks will then be:

  1. Figure out the form of the joint N-dimensional probability density function which has marginals of the form you want along each dimension, but with the correlation matrix that you want. This is easy to do for the Gaussian distribution, where the desired correlation matrix and mean vector is all you need to define the distribution. If your marginals have a simple expression, you can probably find this pdf with some straightforward-but-tedious algebra. This paper cites several others which do what you are talking about, and I'm certain that there are many more.

  2. Formulate a function for basinhopping to minimize such that it's accepted "minima" amount to samples of this pdf you have defined.

Given the results of (1), (2) should be straightforward.

like image 30
stochastic Avatar answered Oct 21 '22 05:10

stochastic