I am experimenting with ways to deal with overplotting in R, and one thing I want to try is to plot individual points but color them by the density of their neighborhood. In order to do this I would need to compute a 2D kernel density estimate at each point. However, it seems that the standard kernel density estimation functions are all grid-based. Is there a function for computing 2D kernel density estimates at specific points that I specify? I would imagine a function that takes x and y vectors as arguments and returns a vector of density estimates.
Kernel Density Estimation (KDE) It is estimated simply by adding the kernel values (K) from all Xj. With reference to the above table, KDE for whole data set is obtained by adding all row values. The sum is then normalized by dividing the number of data points, which is six in this example.
First select the empty cell in your worksheet where you wish for the output table to be generated, then click on the descriptive statistics icon in anomic cell tab and select kernel density estimation from the drop down menu.
The difference between the output of those two tools and that of Kernel Density is that in point and line density, a neighborhood is specified that calculates the density of the population around each output cell. Kernel density spreads the known quantity of the population for each point out from the point location.
Some of the most popular and useful density estimation techniques are mixture models such as Gaussian Mixtures ( GaussianMixture ), and neighbor-based approaches such as the kernel density estimate ( KernelDensity ).
If I understand what you want to do, it could be achieved by fitting a smoothing model to the grid density estimate and then using that to predict the density at each point you are interested in. For example:
# Simulate some data and put in data frame DF
n <- 100
x <- rnorm(n)
y <- 3 + 2* x * rexp(n) + rnorm(n)
# add some outliers
y[sample(1:n,20)] <- rnorm(20,20,20)
DF <- data.frame(x,y)
# Calculate 2d density over a grid
library(MASS)
dens <- kde2d(x,y)
# create a new data frame of that 2d density grid
# (needs checking that I haven't stuffed up the order here of z?)
gr <- data.frame(with(dens, expand.grid(x,y)), as.vector(dens$z))
names(gr) <- c("xgr", "ygr", "zgr")
# Fit a model
mod <- loess(zgr~xgr*ygr, data=gr)
# Apply the model to the original data to estimate density at that point
DF$pointdens <- predict(mod, newdata=data.frame(xgr=x, ygr=y))
# Draw plot
library(ggplot2)
ggplot(DF, aes(x=x,y=y, color=pointdens)) + geom_point()
Or, if I just change n 10^6 we get
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With