Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generate a random number from a density object (or more broadly from a set of numbers)

Tags:

random

r

Let's say I have a set of numbers that I suspect come from the same distribution.

set.seed(20130613)
x <- rcauchy(10)

I would like a function that randomly generates a number from that same unknown distribution. One approach I have thought of is to create a density object and then get the CDF from that and take the inverse CDF of a random uniform variable (see Wikipedia).

den <- density(x)

#' Generate n random numbers from density() object
#' 
#' @param n The total random numbers to generate
#' @param den The density object from which to generate random numbers
rden <- function(n, den)
{
        diffs <- diff(den$x)
        # Making sure we have equal increments
        stopifnot(all(abs(diff(den$x) - mean(diff(den$x))) < 1e-9))
        total <- sum(den$y)
        den$y <- den$y / total
        ydistr <- cumsum(den$y)
        yunif <- runif(n)
        indices <- sapply(yunif, function(y) min(which(ydistr > y)))
        x <- den$x[indices]

        return(x)
}

rden(1, den)
## [1] -0.1854121

My questions are the following:

  1. Is there a better (or built into R) way to generate a random number from a density object?
  2. Are there any other ideas on how to generate a random number from a set of numbers (besides sample)?
like image 914
James Pringle Avatar asked Jun 13 '13 12:06

James Pringle


2 Answers

To generate data from a density estimate you just randomly choose one of the original data points and add a random "error" piece based on the kernel from the density estimate, for the default of "Gaussian" this just means choose a random element from the original vector and add a random normal with mean 0 and sd equal to the bandwidth used:

den <- density(x)

N <- 1000
newx <- sample(x, N, replace=TRUE) + rnorm(N, 0, den$bw)

Another option is to fit a density using the logspline function from the logspline package (uses a different method of estimating a density), then use the rlogspline function in that package to generate new data from the estimated density.

like image 101
Greg Snow Avatar answered Oct 04 '22 20:10

Greg Snow


If all you need is to draw values from your existing pool of numbers, then sample is the way to go.
If you want to draw from the presumed underlying distribution, then use density , and fit that to your presumed distribution to get the necessary coefficients (mean, sd, etc.), and use the appropriate R distribution function.

Beyond that, I'd take a look at Chapter7.3 ("rejection method") of Numerical Recipes in C for ways to "selectively" sample according to any distribution. The code is simple enough to be easily translated into R . My bet is someone already has done so and will post a better answer than this.

like image 43
Carl Witthoft Avatar answered Oct 04 '22 19:10

Carl Witthoft