Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find the probability density of a new data point using "density" function in R

Tags:

r

probability

I am trying to find the best PDF of a continuous data that has unknown distribution, using the "density" function in R. Now, given a new data point, I want to find the probability density of this data point based on the kernel density estimator that I have from the "density" function result. How can I do that?

like image 445
programmingIsFun Avatar asked Jan 21 '15 21:01

programmingIsFun


People also ask

How do you make a probability density function in R?

To plot the probability density function for a t distribution in R, we can use the following functions: dt(x, df) to create the probability density function. curve(function, from = NULL, to = NULL) to plot the probability density function.

How do you find probability density at a point?

The function fX(x) gives us the probability density at point x. It is the limit of the probability of the interval (x,x+Δ] divided by the length of the interval as the length of the interval goes to 0. Remember that P(x<X≤x+Δ)=FX(x+Δ)−FX(x).

How do you find the probability density of data?

To get the probability from a probability density function, we need to integrate the area under the curve for a certain interval. The probability= Area under the curve = density X interval length. In our example, the interval length = 131-41 = 90 so the area under the curve = 0.011 X 90 = 0.99 or ~1.

Which R function is used for computing probability density function?

pnorm is the R function that calculates the c. d. f.


2 Answers

If your new point will be within the range of values produced by density, it's fairly easy to do -- I'd suggest using approx (or approxfun if you need it as a function) to handle the interpolation between the grid-values.

Here's an example:

set.seed(2937107)
x <- rnorm(10,30,3)
dx <- density(x)
xnew <- 32.137
approx(dx$x,dx$y,xout=xnew)

If we plot the density and the new point we can see it's doing what you need:

enter image description here

This will return NA if the new value would need to be extrapolated. If you want to handle extrapolation, I'd suggest direct computation of the KDE for that point (using the bandwidth from the KDE you have).

like image 134
Glen_b Avatar answered Oct 21 '22 16:10

Glen_b


This is one year old, but nevertheless, here is a complete solution. Let's call

d <- density(xs)

and define h = d$bw. Your KDE estimation is completely determined by

  • the elements of xs,
  • the bandwidth h,
  • the type of kernel functions.

Given a new value t, you can compute the corresponding y(t), using the following function, which assumes you have used Gaussian kernels for estimation.

myKDE <- function(t){
    kernelValues <- rep(0,length(xs))
    for(i in 1:length(xs)){
        transformed = (t - xs[i]) / h
        kernelValues[i] <- dnorm(transformed, mean = 0, sd = 1) / h
    }
    return(sum(kernelValues) / length(xs))
}

What myKDE does is it computes y(t) by the definition.

like image 39
Antoine Avatar answered Oct 21 '22 16:10

Antoine