Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generate numbers with specific correlation [closed]

Tags:

r

How to generate a sequence of numbers, which would have a specific correlation (for example 0.56) and would consist of.. say 50 numbers with R program? Ty.

like image 559
Plunksna Avatar asked Nov 08 '12 14:11

Plunksna


3 Answers

Assuming you mean two normal/Gaussian vectors of values with correlation 0.56

We can use mvrnorm() from package MASS

require(MASS)
out <- mvrnorm(50, mu = c(0,0), Sigma = matrix(c(1,0.56,0.56,1), ncol = 2),
               empirical = TRUE)

which gives

> cor(out)
     [,1] [,2]
[1,] 1.00 0.56
[2,] 0.56 1.00

The empirical = TRUE bit is important otherwise the actual correlation achieved is subject to randomness too and will not be exactly the stated value with larger discrepancies for smaller samples.

Assuming you mean a lag 1 correlation of 0.56 & Gaussian random variables

For this one you can use the arima.sim() function:

> arima.sim(list(ar = 0.56), n = 50)
Time Series:
Start = 1 
End = 50 
Frequency = 1 
 [1]  0.62125233 -0.04742303  0.57468608 -0.07201988 -1.91416757 -1.11827563
 [7]  0.15718249  0.63217365 -1.24635896 -0.22950855 -0.79918784  0.31892842
[13]  0.33335688 -1.24328177 -0.79056890  1.08443057  0.55553819  0.33460674
[19] -0.33037659 -0.65244221  0.70461755  0.61450122  0.53731454  0.19563672
[25]  1.73945110  1.27119241  0.82484460  1.58382861  1.81619212 -0.94462052
[31] -1.36024898 -0.30964390 -0.94963216 -3.75725819 -1.77342095 -1.20963799
[37] -1.76325350 -1.20556172 -0.94684678 -0.85407649  0.14922226 -0.31109945
[43]  0.39456259  0.89610859 -0.70913792 -2.27954408 -1.14722464  0.39140446
[49]  0.66376227  1.63275483
like image 176
Gavin Simpson Avatar answered Oct 18 '22 23:10

Gavin Simpson


If you don't want to specify those matrices, other options are corgen from ecodist:

library("ecodist")
xy <- corgen(len = 50, r = 0.56, epsilon = 0.01)

Or rolling your own:

simcor <- function (n, xmean, xsd, ymean, ysd, correlation) {
    x <- rnorm(n)
    y <- rnorm(n)
    z <- correlation * scale(x)[,1] + sqrt(1 - correlation^2) * 
             scale(resid(lm(y ~ x)))[,1]
    xresult <- xmean + xsd * scale(x)[,1]
    yresult <- ymean + ysd * z
    data.frame(x=xresult,y=yresult)
}

Test

> r <- simcor(n = 50, xmean = 12, ymean = 30, xsd = 3, ysd = 4, correlation = 0.56)
> cor(r$x,r$y)
[1] 0.56
like image 9
MattBagg Avatar answered Oct 19 '22 00:10

MattBagg


Use rmvnorm from the mvtnorm package to sample from the multivariate normal distribution. For example for correlation of 0.56:

library("mvtnorm")
foo <- rmvnorm(10000,c(0,0),matrix(c(1,0.56,0.56,1),2,2))

Test:

> cor(foo[,1],foo[,2])
[1] 0.5611207
like image 7
Sacha Epskamp Avatar answered Oct 18 '22 23:10

Sacha Epskamp