Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sampling from a given probability distribution using R

Tags:

random

r

sampling

Given the probability distribution as follows: enter image description here

x-coordinate represents hours, y-coordinate means the probability for each hour.

The problem is how to generate a set of 1000 random data that follows the probability distribution?

like image 755
Gamp Avatar asked Oct 05 '17 13:10

Gamp


2 Answers

The important function is sample. You can specify an extra argument prob to sample which specifies the probabilities for each element. For example,

sample(1:22,1000,replace=TRUE,prob=c(
  0,1,0,3,7,14,30,24,5,3,3,2,4,3,1,2,3,2,2,2,1,0
)

(replace that string of numbers with the heights of your bars). The prob argument doesn't have to sum to one, R will renormalise it for you.

R may generate a warning that it is using "Walker's Alias method" and the results are not comparable to old versions of R. This is normal, and nothing to worry about.

like image 103
JDL Avatar answered Sep 22 '22 22:09

JDL


First, build a vector describing this probability distribution, then use sample:

distribution <- c( 2, 4, 4, rep(5, 7), rep(6, 14), rep(7, 29),
               rep(8, 23), rep(9, 7), rep(10, 4), rep(11, 3))
sample(distribution, 1000, replace=TRUE)

I left values after 11 out and probably did not read all the values exactly, but you can see the idea. The distribution vector may be easier to produce depending on the format, your data is in, now.

like image 33
Bernhard Avatar answered Sep 25 '22 22:09

Bernhard