Find the probability density of a new data point using "density" function in R

Q: How do you find the probability density of data?

To get the probability from a probability density function, we need to integrate the area under the curve for a certain interval. The probability= Area under the curve = density X interval length. In our example, the interval length = 131-41 = 90 so the area under the curve = 0.011 X 90 = 0.99 or ~1.

Q: Which R function is used for computing probability density function?

pnorm is the R function that calculates the c. d. f.

Tags:

r

probability

I am trying to find the best PDF of a continuous data that has unknown distribution, using the "density" function in R. Now, given a new data point, I want to find the probability density of this data point based on the kernel density estimator that I have from the "density" function result. How can I do that?

445

asked Jan 21 '15 21:01

programmingIsFun

2 Answers

If your new point will be within the range of values produced by density, it's fairly easy to do -- I'd suggest using approx (or approxfun if you need it as a function) to handle the interpolation between the grid-values.

Here's an example:

set.seed(2937107)
x <- rnorm(10,30,3)
dx <- density(x)
xnew <- 32.137
approx(dx$x,dx$y,xout=xnew)

If we plot the density and the new point we can see it's doing what you need:

enter image description here

This will return NA if the new value would need to be extrapolated. If you want to handle extrapolation, I'd suggest direct computation of the KDE for that point (using the bandwidth from the KDE you have).

134

answered Oct 21 '22 16:10

Glen_b

This is one year old, but nevertheless, here is a complete solution. Let's call

d <- density(xs)

and define h = d$bw. Your KDE estimation is completely determined by

the elements of xs,
the bandwidth h,
the type of kernel functions.

Given a new value t, you can compute the corresponding y(t), using the following function, which assumes you have used Gaussian kernels for estimation.

myKDE <- function(t){
    kernelValues <- rep(0,length(xs))
    for(i in 1:length(xs)){
        transformed = (t - xs[i]) / h
        kernelValues[i] <- dnorm(transformed, mean = 0, sd = 1) / h
    }
    return(sum(kernelValues) / length(xs))
}

What myKDE does is it computes y(t) by the definition.

answered Oct 21 '22 16:10

Antoine

Related questions
                            
                                Use object names as list names in R
                            
                                ggplot2: how to show the legend [duplicate]
                            
                                reading in a text file with a SUB (1a) (Control-Z) character in R on Windows
                            
                                Errors while trying to fit gamma distribution with R fitdistr{MASS}
                            
                                Error when compiling pdf using knitr in rstudio
                            
                                x axis and y axis labels in pheatmap in R
                            
                                How data.table sorts strings when setting key
                            
                                How to get list of packages used in a knitr .Rnw document?
                            
                                When simulating multivariate data for regression, how can I set the R-squared (example code included)?
                            
                                How do I correctly close a connection in R, so its connection 'slot' gets released?
                            
                                Creating good kable output in RStudio
                            
                                R Shiny navbarMenu
                            
                                Platform neutral way to check if a program exists (e.g. pdfcrop) while creating vignette
                            
                                Order of operator precedence when using ":" (the colon)
                            
                                xtable in .Rmd then knit as pdf in rstudio shows % comments
                            
                                Can't find gfortran 4.8 to build package
                            
                                Extract week number from POSIXct object
                            
                                Plot time series and forecast simultaneously using ggplot2
                            
                                ggplot2 Force y-axis to start at origin and float y-axis upper limit
                            
                                More effective merging of matched column with duplicates in data.table

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With