Given the probability distribution as follows:
x-coordinate represents hours, y-coordinate means the probability for each hour.
The problem is how to generate a set of 1000 random data that follows the probability distribution?
The important function is sample
. You can specify an extra argument prob
to sample
which specifies the probabilities for each element. For example,
sample(1:22,1000,replace=TRUE,prob=c(
0,1,0,3,7,14,30,24,5,3,3,2,4,3,1,2,3,2,2,2,1,0
)
(replace that string of numbers with the heights of your bars). The prob
argument doesn't have to sum to one, R will renormalise it for you.
R may generate a warning that it is using "Walker's Alias method" and the results are not comparable to old versions of R. This is normal, and nothing to worry about.
First, build a vector describing this probability distribution, then use sample:
distribution <- c( 2, 4, 4, rep(5, 7), rep(6, 14), rep(7, 29),
rep(8, 23), rep(9, 7), rep(10, 4), rep(11, 3))
sample(distribution, 1000, replace=TRUE)
I left values after 11 out and probably did not read all the values exactly, but you can see the idea. The distribution vector may be easier to produce depending on the format, your data is in, now.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With