I have point data off the coast of Oahu. Someone else used those same data to create a large polygon
. I believe he first created a heatmap
using a quartic (biweight) kernel
with a radius of 1 km around each point and perhaps a 1 km-square pixel size. He cited Silverman (1986, p. 76, equation 4.5, which I believe refers to the book “Density Estimation for Statistics and Data Analysis”). I believe he converted his heatmap
into his polygon
. I am attempting to approximate his polygon
with fake data using R
and Windows 10
. I can come close using the kde
function in the ks
package (see figure below). But that package only includes Gaussian kernels
. Is it possible to create a similar polygon
using a quartic kernel
?
The other analysist actually created two version of the polygon
. The border of one was labeled “> 1 per km density”; the border of the other was labeled “> 0.5 per km density”. I do not know whether he used R
, QGIS
, ArcGIS
or something else. I was unable to create a single large polygon
in QGIS
and do not have ArcGIS
.
Thank you for any suggestions on how to create a polygon
similar to the one shown but using a quartic kernel
instead of a Gaussian kernel
. If I can provide additional information please let me know.
Here is a link to my fake data in CSV
and QGIS
format: enter link description here (EDIT: Hopefully anyone can access the fake data now. I could before but I guess others could not.)
1. fake_points_oahu.csv
a. raw data
2. fake_points_oahu_utm (.shp, .dbf, .prj, .shx)
a. vector point layer
3. fake_points_oahu_June11_2021.png
a. the figure shown above
Here is my R
code:
setwd('C:/Users/mark_/Documents/ctmm/density_in_R/density_files_for_StackOverflow/')
library(sf) # to read shapefile
library(ks) # to use kde function
my.data <- read.csv("fake_points_oahu.csv", header = TRUE, stringsAsFactors = FALSE, na.strings = "NA")
head(my.data)
# Import shapefile
st_layers("fake_points_oahu_utm.shp")
points_utm <- st_read(dsn = "fake_points_oahu_utm.shp", layer = 'fake_points_oahu_utm')
st_crs(points_utm)
plot(points_utm)
my.matrix <- as.matrix(my.data[,2:3])
head(my.matrix)
# This uses the Guassian kernel
my_gps_hpi <- Hpi(x = my.matrix, pilot = "samse", pre = "scale")
my.fhat <- kde(x = my.matrix, compute.cont = TRUE, h = my_gps_hpi,
xmin = c(min(my.data$longitude), min(my.data$latitude)),
xmax = c(max(my.data$longitude), max(my.data$latitude)),
bgridsize = c(500, 500))
my.contours <- c(96.5)
contourLevels(my.fhat, cont = my.contours)
contourSizes(my.fhat, cont = my.contours, approx = TRUE)
plot(my.data$longitude, my.data$latitude)
plot(my.fhat, lwd = 3, display = "filled.contour", cont = my.contours, add = TRUE)
png(file="fake_points_oahu_June11_2021.png")
plot(my.data$longitude, my.data$latitude)
plot(my.fhat, lwd = 3, display = "filled.contour", cont = my.contours, add = TRUE)
dev.off()
You can perform your estimation by slightly modifying the kde2d function from the MASS
package. To my knowledge there is currently no package in R that implements bivariate KDE estimation with a quartic (biweight) kernel for a bivariate case.
The univariate biweight kernel can be extended to a multivariate kernel in several ways and the simplest is to just use a product kernel, where you use the univariate kernel for each of your dimensions and then multiply the result. You can find the mathematical expression for the biweight product kernel here.
When you incorporate this kernel into the kde2d
density estimator from the MASS
package it looks as follows
kde_biweight_kernel <- function(x,y, bw_x, bw_y, xrange, yrange){
# This function is based on the kde2d function from
# the MASS package. The only difference is that the Gaussian
# kernel is substituted with a biweight product kernel
# product kernel:
biweight_kernel <- function(u){
mask = abs(u) > 1
kernel_val = (15/16)*((1-u^2)^2)
kernel_val[mask] = 0
return(kernel_val)
}
lims = c(xrange, yrange)
n = 500
nx <- length(x)
n <- rep(n, length.out = 2L)
# get grid on which we want to estimate the density
gx <- seq.int(lims[1L], lims[2L], length.out = n[1L])
gy <- seq.int(lims[3L], lims[4L], length.out = n[2L])
# inputs to kernel
ax <- outer(gx, x, "-" )/bw_x
ay <- outer(gy, y, "-" )/bw_y
# evaluate and multiply kernel results along both axes
res = tcrossprod(biweight_kernel(ax), biweight_kernel(ay))/(nx * bw_x * bw_y)
return(list(x = gx, y = gy, z = res))
}
Using the kde_biweight_kernel
function you can compute your desired density as follows
library(MASS)
library(birk)
library(kedd)
library(sf)
library(ks)
# load data
my.data <- read.csv("fake_points_oahu.csv", header = TRUE, stringsAsFactors = FALSE, na.strings = "NA")
# Import shapefile
st_layers("fake_points_oahu_utm.shp")
points_utm <- st_read(dsn = "fake_points_oahu_utm.shp", layer = 'fake_points_oahu_utm')
x = my.data$longitude
y = my.data$latitude
# determine bandwidth for biweight kernel along both axes
bw_x = h.amise(x, deriv.order = 0, kernel = "biweight")$h
bw_y = h.amise(y, deriv.order = 0, kernel = "biweight")$h
# get ranges in which you want to estimate density
xrange = c(min(my.data$longitude), max(my.data$longitude))
yrange = c(min(my.data$latitude), max(my.data$latitude))
# get 2d density estimate with quartic (biweight) kernel
result = kde_biweight_kernel(x,y, bw_x, bw_y, xrange, yrange)
Note that the bandwidth is especially computed for the biweight kernel case.
The resulting density object is a bit different from the ks::kde
object. It does, for example, not have contour levels yet. We can get the contour levels by computing the quantiles with a slightly modified version of the kde2dQuantile
function from the rmngb
package
# get quantiles of interest:
kde2dQuantile <- function(d, X, Y, probs = .05) {
xInd <- sapply(X, function(x) which.closest(d$x, x))
yInd <- sapply(Y, function(x) which.closest(d$y, x))
zValues <- d$z[cbind(xInd, yInd)]
quantile(zValues, probs=probs)
}
# get quantiles
quantiles = kde2dQuantile(result, x, y, seq(0,1,by=0.001))
From your question I am not sure which quantile you are interested in so I just chose the 1% quantile.
To be able to plot the data in the same way as in the question, we have to format the density result in the same way as objects from the kde
class:
# to make the kde estimate compatible with the other density estimates
# from the ks package, the result can be converted to a named list.
# -> create ks::KDE object:
axes = matrix(c(result$x,result$y), ncol = 2)
colnames(axes) = c('longitude', 'latitude')
my.fhat_biweight = list('x' = axes,
'eval.points' = list(result$x, result$y),
'estimate' = result['z']$z,
'gridtype' = 'linear', 'gridded' = TRUE,
'binned' = TRUE, 'names' = c("longitude","latitude" ))
# add quantile to ks::KDE object
my.fhat_biweight$cont = quantiles
# change class (make sure ks package is loaded for this)
class(my.fhat_biweight) <- "kde"
Finally plot the biweight kernel density over the data
plot(my.data$longitude, my.data$latitude)
plot(my.fhat_biweight, lwd = 3, display = "filled.contour", cont = cont=c(96.5), add = TRUE)
this outputs:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With