This is my first post to the R-community, so pardon me if it is silly. I would like to use the functions geom_density2d and stat_density2d in ggplot2 to plot kernel density estimates, but the problem is that they can't handle weighted data. From what I understand, these two functions call the function kde2d from package MASS to make the kernel density estimate. And the kde2d doesn't take data weights as a parameter.
Now, I have found this altered version of kde2d http://www.inside-r.org/node/226757, which takes weights as a parameter and is based on the source code of kde2d. The code of this function:
kde2d.weighted <- function (x, y, w, h, n = 25, lims = c(range(x), range(y))) {
nx <- length(x)
if (length(y) != nx)
stop("data vectors must be the same length")
if (length(w) != nx & length(w) != 1)
stop("weight vectors must be 1 or length of data")
gx <- seq(lims[1], lims[2], length = n) # gridpoints x
gy <- seq(lims[3], lims[4], length = n) # gridpoints y
if (missing(h))
h <- c(bandwidth.nrd(x), bandwidth.nrd(y));
if (missing(w))
w <- numeric(nx)+1;
h <- h/4
ax <- outer(gx, x, "-")/h[1] # distance of each point to each grid point in x-direction
ay <- outer(gy, y, "-")/h[2] # distance of each point to each grid point in y-direction
z <- (matrix(rep(w,n), nrow=n, ncol=nx, byrow=TRUE)*matrix(dnorm(ax), n, nx)) %*% t(matrix(dnorm(ay), n, nx))/(sum(w) * h[1] * h[2]) # z is the density
return(list(x = gx, y = gy, z = z))
}
I would like to make the functions geom_density2d and stat_density2d call kd2d.weighted instead of kde2d, and by that making them accept weighted data.
I have never changed any functions in existing R packages so my question is what is the easiest way doing this?
You can actually pass your own density data to geom_contour
which would probably be the easiest. Let's start with a sample dataset by adding weights to the geyser data.
library("MASS")
data(geyser, "MASS")
geyserw <- transform(geyser,
weight = sample(1:5, nrow(geyser), replace=T)
)
Now we use your weighted function to calculate the density and turn it into a data.frame
dens <- kde2d.weighted(geyserw$duration, geyserw$waiting, geyserw$weight)
dfdens <- data.frame(expand.grid(x=dens$x, y=dens$y), z=as.vector(dens$z))
Now we plot the data
ggplot(geyserw, aes(x = duration, y = waiting)) +
geom_point() + xlim(0.5, 6) + ylim(40, 110) +
geom_contour(aes(x=x, y=y, z=z), data= dfdens)
And that should do it
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With