I am trying to find the best PDF of a continuous data that has unknown distribution, using the "density" function in R. Now, given a new data point, I want to find the probability density of this data point based on the kernel density estimator that I have from the "density" function result. How can I do that?
To plot the probability density function for a t distribution in R, we can use the following functions: dt(x, df) to create the probability density function. curve(function, from = NULL, to = NULL) to plot the probability density function.
The function fX(x) gives us the probability density at point x. It is the limit of the probability of the interval (x,x+Δ] divided by the length of the interval as the length of the interval goes to 0. Remember that P(x<X≤x+Δ)=FX(x+Δ)−FX(x).
To get the probability from a probability density function, we need to integrate the area under the curve for a certain interval. The probability= Area under the curve = density X interval length. In our example, the interval length = 131-41 = 90 so the area under the curve = 0.011 X 90 = 0.99 or ~1.
pnorm is the R function that calculates the c. d. f.
If your new point will be within the range of values produced by density
, it's fairly easy to do -- I'd suggest using approx
(or approxfun
if you need it as a function) to handle the interpolation between the grid-values.
Here's an example:
set.seed(2937107)
x <- rnorm(10,30,3)
dx <- density(x)
xnew <- 32.137
approx(dx$x,dx$y,xout=xnew)
If we plot the density and the new point we can see it's doing what you need:
This will return NA
if the new value would need to be extrapolated. If you want to handle extrapolation, I'd suggest direct computation of the KDE for that point (using the bandwidth from the KDE you have).
This is one year old, but nevertheless, here is a complete solution. Let's call
d <- density(xs)
and define h = d$bw
. Your KDE estimation is completely determined by
xs
,h
,Given a new value t
, you can compute the corresponding y(t)
, using the following function, which assumes you have used Gaussian kernels for estimation.
myKDE <- function(t){
kernelValues <- rep(0,length(xs))
for(i in 1:length(xs)){
transformed = (t - xs[i]) / h
kernelValues[i] <- dnorm(transformed, mean = 0, sd = 1) / h
}
return(sum(kernelValues) / length(xs))
}
What myKDE
does is it computes y(t)
by the definition.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With