I need to generate random values for two beta-distributed variables that are correlated using SAS. The two variables of interest are characterized as follows:
X1
has mean = 0.896
and variance = 0.001
.
X2
has mean = 0.206
and variance = 0.004
.
For X1
and X2
, p = 0.5, where p is the correlation coefficient.
Using SAS, I understand how to generate a random number specifying a beta distribution using the function X = RAND('BETA', a, b)
, where a and b are the two shape parameters for a variable X that can be calculated from the mean and variance. However, I want to generate values for both X1
and X2
simultaneously while specifying that they are correlated at p = 0.5.
Using SAS, I understand how to generate a random number specifying a beta distribution using the function X = RAND ('BETA', a, b), where a and b are the two shape parameters for a variable X that can be calculated from the mean and variance.
If we have 2 normal, uncorrelated random variables $X_1, X_2$ then we can create 2 correlated random variables with the formula $Y=ho X_1+ \sqrt{1-ho^2} X_2$ and then $Y$ will have a correlation $ho$ with $X_1$.
To generate correlated normally distributed random samples, one can first generate uncorrelated samples, and then multiply them by a matrix C such that CCT = R, where R is the desired covariance matrix. C can be created, for example, by using the Cholesky decomposition of R, or from the eigenvalues and eigenvectors of R. In :
for the MVN data in order to have the correlation ot the beta variables be 0.5. For Spearman (rank) correlations, no adjustment is necessary.
This solution is based on modified methods used from Chapter 9 of Simulating Data with SAS by Rick Wicklin.
In this particular example, I first have to define variable means, variances, and shape-parameters (alpha, beta) that are associated with the beta distribution:
data beta_corr_vars;
input x1 var1 x2 var2; *mean1, variance1, mean2, variance2;
*calculate shape parameters alpha and beta from means and variances;
alpha1 = ((1 - x1) / var1 - 1/ x1) * x1**2;
alpha2 = ((1 - x2) / var2 - 1/ x2) * x2**2;
beta1 = alpha1 * (1 / x1 - 1);
beta2 = alpha2 * (1 / x2 - 1);
*here are the means and variances referred to in the original question;
datalines;
0.896 0.001 0.206 0.004
;
run;
proc print data = beta_corr_vars;
run;
Once these variables are defined:
proc iml;
use beta_corr_vars; read all;
call randseed(12345);
N = 10000; *number of random variable sets to generate;
*simulate bivariate normal data with a specified correlation (here, rho = 0.5);
Z = RandNormal(N, {0, 0}, {1 0.5, 0.5 1}); *RandNormal(N, Mean, Cov);
*transform the normal variates into uniform variates;
U = cdf("Normal", Z);
*From here, we can obtain beta variates for each column of U by;
*applying the inverse beta CDF;
x1_beta = quantile("Beta", U[,1], alpha1, beta1);
x2_beta = quantile("Beta", U[,2], alpha2, beta2);
X = x1_beta || x2_beta;
*check adequacy of rho values--they approach the desired values with more sims (N);
rhoZ = corr(Z)[1,2];
rhoX = corr(X)[1,2];
print X;
print rhoZ rhoX;
Thank you to all users who contributed to this answer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With