Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I get the value of a kernel density estimate at specific points?

I am experimenting with ways to deal with overplotting in R, and one thing I want to try is to plot individual points but color them by the density of their neighborhood. In order to do this I would need to compute a 2D kernel density estimate at each point. However, it seems that the standard kernel density estimation functions are all grid-based. Is there a function for computing 2D kernel density estimates at specific points that I specify? I would imagine a function that takes x and y vectors as arguments and returns a vector of density estimates.

like image 832
Ryan C. Thompson Avatar asked Apr 24 '13 20:04

Ryan C. Thompson


People also ask

How is kernel density estimate calculated?

Kernel Density Estimation (KDE) It is estimated simply by adding the kernel values (K) from all Xj. With reference to the above table, KDE for whole data set is obtained by adding all row values. The sum is then normalized by dividing the number of data points, which is six in this example.

How do you calculate Kernel Density in Excel?

First select the empty cell in your worksheet where you wish for the output table to be generated, then click on the descriptive statistics icon in anomic cell tab and select kernel density estimation from the drop down menu.

What is the difference between Kernel Density and point density?

The difference between the output of those two tools and that of Kernel Density is that in point and line density, a neighborhood is specified that calculates the density of the population around each output cell. Kernel density spreads the known quantity of the population for each point out from the point location.

Which method is used for density estimation?

Some of the most popular and useful density estimation techniques are mixture models such as Gaussian Mixtures ( GaussianMixture ), and neighbor-based approaches such as the kernel density estimate ( KernelDensity ).


1 Answers

If I understand what you want to do, it could be achieved by fitting a smoothing model to the grid density estimate and then using that to predict the density at each point you are interested in. For example:

# Simulate some data and put in data frame DF
n <- 100
x <- rnorm(n)
y <- 3 + 2* x * rexp(n) + rnorm(n)
# add some outliers
y[sample(1:n,20)] <- rnorm(20,20,20)
DF <- data.frame(x,y)

# Calculate 2d density over a grid
library(MASS)
dens <- kde2d(x,y)

# create a new data frame of that 2d density grid
# (needs checking that I haven't stuffed up the order here of z?)
gr <- data.frame(with(dens, expand.grid(x,y)), as.vector(dens$z))
names(gr) <- c("xgr", "ygr", "zgr")

# Fit a model
mod <- loess(zgr~xgr*ygr, data=gr)

# Apply the model to the original data to estimate density at that point
DF$pointdens <- predict(mod, newdata=data.frame(xgr=x, ygr=y))

# Draw plot
library(ggplot2)
ggplot(DF, aes(x=x,y=y, color=pointdens)) + geom_point()

enter image description here

Or, if I just change n 10^6 we get

enter image description here

like image 125
Peter Ellis Avatar answered Sep 18 '22 18:09

Peter Ellis