Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create correlated variables from existing variable [closed]

Tags:

r

correlation

Let's say I have a vector:

Q<-rnorm(10,mean=0,sd=20)

From this vector I would like to:

1. create 10 variables (a1...a10) that each have a correlation above .5 (i.e. between .5 and 1) with Q.

the first part can be done with:

t1<-sapply(1:10, function(x) jitter(t, factor=100))

2. each of these variables (a1...a10) should have a pre-specified correlation with each other. For example some should be correlated .8 and some -.2.

Can these two things be done?

I create a correlation matrix:

cor.table <- matrix( sample( c(0.9,-0.9) , 2500 , prob = c( 0.8 , 0.2 ) , repl = TRUE ) , 50 , 50 )
k=1
while (k<=length(cor.table[1,])){
    cor.table[1,k]<-0.55
    k=k+1
    }
k=1
while (k<=length(cor.table[,1])){
    cor.table[k,1]<-0.55
    k=k+1
    }   
    diag(cor.table) <- 1

However, when I apply the excellent solution by @SprengMeister I get the error:

Error in eigen(cor.table)$values > 0 : 
  invalid comparison with complex values

continued here: Eigenvalue decomposition of correlation matrix

like image 789
user1984076 Avatar asked Jan 13 '23 09:01

user1984076


2 Answers

As a pointer to solution use noise function jitter in R:

set.seed(100)
t = rnorm(10,mean=0,sd=20)
t1 = jitter(t, factor = 100)
cor(t,t1)
[1] 0.8719447
like image 129
topchef Avatar answered Jan 30 '23 14:01

topchef


To generate data with a prescribed correlation (or variance), you can start with random data, and rescale it using the Cholesky decomposition of the desired correlation matrix.

# Sample data
Q <- rnorm(10, mean=0, sd=20)
desired_correlations <- matrix(c(
  1, .5, .6, .5,
  .5, 1, .2, .8,
  .6, .2, 1, .5,
  .5, .8, .5, 1 ), 4, 4 )
stopifnot( eigen( desired_correlations )$values > 0 )

# Random data, with Q in the first column
n <- length(Q)
k <- ncol(desired_correlations)
x <- matrix( rnorm(n*k), nc=k )
x[,1] <- Q

# Rescale, first to make the variance equal to the identity matrix, 
# then to get the desired correlation matrix.
y <- x %*% solve(chol(var(x))) %*% chol(desired_correlations)
var(y)
y[,1] <- Q  # The first column was only rescaled: that does not affect the correlation
cor(y)      # Desired correlation matrix
like image 31
Vincent Zoonekynd Avatar answered Jan 30 '23 13:01

Vincent Zoonekynd