Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: Generate data from a probability density distribution

Say I have a simple array, with a corresponding probability distribution.

library(stats)    
data <- c(0,0.08,0.15,0.28,0.90)
pdf_of_data <- density(data, from= 0, to=1, bw=0.1)

Is there a way I could generate another set of data using the same distribution. As the operation is probabilistic, it need not exactly match the initial distribution anymore, but will be just generated from it.

I did have success finding a simple solution on my own. Thanks!

like image 412
puslet88 Avatar asked Sep 30 '15 16:09

puslet88


2 Answers

Your best bet is to generate the empirical cumulative density function, approximate the inverse, and then transform the input.

The compound expression looks like

random.points <- approx(
  cumsum(pdf_of_data$y)/sum(pdf_of_data$y),
  pdf_of_data$x,
  runif(10000)
)$y

Yields

hist(random.points, 100)

enter image description here

like image 73
user295691 Avatar answered Oct 12 '22 02:10

user295691


To draw from the curve:

sample(pdf_of_data$x, 1e6, TRUE, pdf_of_data$y)
like image 21
Neal Fultz Avatar answered Oct 12 '22 01:10

Neal Fultz