Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to simulate a strong correlation of data with R

Tags:

r

ggpubr

Sometimes I try to simulate data by using the rnorm function, which I have done below:

mom.iq <- rnorm(n=1000,
                mean=120,
                sd=15)
kid.score <- rnorm(n=1000,
                   mean=45,
                   sd=20)
df <- data.frame(mom.iq,
                 kid.score)

But when I plot something like this, it usually ends up with data thats highly uncorrelated:

library(ggpubr)

ggscatter(df,
          x="mom.iq",
          y="kid.score")+
  geom_smooth(method = "lm")

lm.pic

However, I would like to simulate something with a stronger correlation if possible. Is there an easy way to do this within R? I'm aware that I could just as easily just produce my own values manually, but thats not super practical for recreating large samples.

like image 231
Shawn Hemelstrand Avatar asked Nov 15 '25 13:11

Shawn Hemelstrand


1 Answers

What you are doing is to generate two independent variables; so, it is normal not to be correlated. What you can do is this:

# In order to make the values reproducible
  set.seed(12345)

# Generate independent variable
  x <- rnorm(n=1000, mean=120, sd=15)
# Generate the dependen variable
  y <- 3*x + 6 + rnorm(n=1000, mean = 0, sd = 5)

I used 3 and 6, but you can define them as you want (a and b) in order to get a linear dependence defined as y = a*x + b.

The sum of rnorm(n=1000, mean = 0, sd = 5) is done to add some variability and avoid a perfect correlation between x and y. If you want to get a more correlated data, reduce the standard deviation (sd) and to get a lower correlation, increase its value.

like image 55
R18 Avatar answered Nov 17 '25 09:11

R18



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!