Say I have a simple array, with a corresponding probability distribution.
library(stats)
data <- c(0,0.08,0.15,0.28,0.90)
pdf_of_data <- density(data, from= 0, to=1, bw=0.1)
Is there a way I could generate another set of data using the same distribution. As the operation is probabilistic, it need not exactly match the initial distribution anymore, but will be just generated from it.
I did have success finding a simple solution on my own. Thanks!
Your best bet is to generate the empirical cumulative density function, approximate the inverse, and then transform the input.
The compound expression looks like
random.points <- approx(
cumsum(pdf_of_data$y)/sum(pdf_of_data$y),
pdf_of_data$x,
runif(10000)
)$y
Yields
hist(random.points, 100)
To draw from the curve:
sample(pdf_of_data$x, 1e6, TRUE, pdf_of_data$y)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With