Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting quartic kernel heatmap into large polygon with R

I have point data off the coast of Oahu. Someone else used those same data to create a large polygon. I believe he first created a heatmap using a quartic (biweight) kernel with a radius of 1 km around each point and perhaps a 1 km-square pixel size. He cited Silverman (1986, p. 76, equation 4.5, which I believe refers to the book “Density Estimation for Statistics and Data Analysis”). I believe he converted his heatmap into his polygon. I am attempting to approximate his polygon with fake data using R and Windows 10. I can come close using the kde function in the ks package (see figure below). But that package only includes Gaussian kernels. Is it possible to create a similar polygon using a quartic kernel?

enter image description here

The other analysist actually created two version of the polygon. The border of one was labeled “> 1 per km density”; the border of the other was labeled “> 0.5 per km density”. I do not know whether he used R, QGIS, ArcGIS or something else. I was unable to create a single large polygon in QGIS and do not have ArcGIS.

Thank you for any suggestions on how to create a polygon similar to the one shown but using a quartic kernel instead of a Gaussian kernel. If I can provide additional information please let me know.

Here is a link to my fake data in CSV and QGIS format: enter link description here (EDIT: Hopefully anyone can access the fake data now. I could before but I guess others could not.)

1. fake_points_oahu.csv

     a. raw data

2. fake_points_oahu_utm (.shp, .dbf, .prj, .shx) 

     a. vector point layer 

3. fake_points_oahu_June11_2021.png

     a. the figure shown above

Here is my R code:

setwd('C:/Users/mark_/Documents/ctmm/density_in_R/density_files_for_StackOverflow/')

library(sf) # to read shapefile
library(ks) # to use kde function

my.data <- read.csv("fake_points_oahu.csv", header = TRUE, stringsAsFactors = FALSE, na.strings = "NA")
head(my.data)

# Import shapefile
st_layers("fake_points_oahu_utm.shp")

points_utm <- st_read(dsn = "fake_points_oahu_utm.shp", layer = 'fake_points_oahu_utm')
st_crs(points_utm)
plot(points_utm)

my.matrix <- as.matrix(my.data[,2:3])
head(my.matrix)

# This uses the Guassian kernel
my_gps_hpi <- Hpi(x = my.matrix, pilot = "samse", pre = "scale")

my.fhat <- kde(x = my.matrix, compute.cont = TRUE, h = my_gps_hpi,
               xmin = c(min(my.data$longitude), min(my.data$latitude)),
               xmax = c(max(my.data$longitude), max(my.data$latitude)),
               bgridsize = c(500, 500))

my.contours <- c(96.5)

contourLevels(my.fhat, cont = my.contours)
contourSizes(my.fhat, cont = my.contours, approx = TRUE)

plot(my.data$longitude, my.data$latitude)
plot(my.fhat, lwd = 3, display = "filled.contour", cont = my.contours, add = TRUE)

png(file="fake_points_oahu_June11_2021.png")

     plot(my.data$longitude, my.data$latitude)
     plot(my.fhat, lwd = 3, display = "filled.contour", cont = my.contours, add = TRUE)

dev.off()
like image 272
Mark Miller Avatar asked Jun 11 '21 20:06

Mark Miller


1 Answers

You can perform your estimation by slightly modifying the kde2d function from the MASS package. To my knowledge there is currently no package in R that implements bivariate KDE estimation with a quartic (biweight) kernel for a bivariate case.

The univariate biweight kernel can be extended to a multivariate kernel in several ways and the simplest is to just use a product kernel, where you use the univariate kernel for each of your dimensions and then multiply the result. You can find the mathematical expression for the biweight product kernel here. When you incorporate this kernel into the kde2d density estimator from the MASS package it looks as follows

kde_biweight_kernel <- function(x,y, bw_x, bw_y, xrange, yrange){
  # This function is based on the kde2d function from 
  # the MASS package. The only difference is that the Gaussian
  # kernel is substituted with a biweight product kernel
  
  # product kernel:
  biweight_kernel <- function(u){
    mask = abs(u) > 1
    kernel_val = (15/16)*((1-u^2)^2)
    kernel_val[mask] = 0
    return(kernel_val)
  }
  
  lims = c(xrange, yrange)
  n = 500
  nx <- length(x)
  n <- rep(n, length.out = 2L)
  # get grid on which we want to estimate the density
  gx <- seq.int(lims[1L], lims[2L], length.out = n[1L])
  gy <- seq.int(lims[3L], lims[4L], length.out = n[2L])
  
  # inputs to kernel
  ax <- outer(gx, x, "-" )/bw_x
  ay <- outer(gy, y, "-" )/bw_y
  
  # evaluate and multiply kernel results along both axes
  res = tcrossprod(biweight_kernel(ax), biweight_kernel(ay))/(nx * bw_x * bw_y)
  return(list(x = gx, y = gy, z = res))
}

Using the kde_biweight_kernelfunction you can compute your desired density as follows

library(MASS)
library(birk)
library(kedd)
library(sf)
library(ks)


# load data
my.data <- read.csv("fake_points_oahu.csv", header = TRUE, stringsAsFactors = FALSE, na.strings = "NA")
# Import shapefile
st_layers("fake_points_oahu_utm.shp")
points_utm <- st_read(dsn = "fake_points_oahu_utm.shp", layer = 'fake_points_oahu_utm')

x = my.data$longitude
y = my.data$latitude

# determine bandwidth for biweight kernel along both axes
bw_x = h.amise(x, deriv.order = 0, kernel = "biweight")$h
bw_y = h.amise(y, deriv.order = 0, kernel = "biweight")$h

# get ranges in which you want to estimate density
xrange = c(min(my.data$longitude), max(my.data$longitude))
yrange = c(min(my.data$latitude),  max(my.data$latitude))

# get 2d density estimate with quartic (biweight) kernel
result = kde_biweight_kernel(x,y, bw_x, bw_y, xrange, yrange)

Note that the bandwidth is especially computed for the biweight kernel case. The resulting density object is a bit different from the ks::kde object. It does, for example, not have contour levels yet. We can get the contour levels by computing the quantiles with a slightly modified version of the kde2dQuantile function from the rmngbpackage

# get quantiles of interest:
kde2dQuantile <- function(d, X, Y, probs = .05) {
  xInd <- sapply(X, function(x) which.closest(d$x, x))
  yInd <- sapply(Y, function(x) which.closest(d$y, x))
  zValues <- d$z[cbind(xInd, yInd)]
  quantile(zValues, probs=probs)
}
# get quantiles
quantiles = kde2dQuantile(result, x, y, seq(0,1,by=0.001))

From your question I am not sure which quantile you are interested in so I just chose the 1% quantile. To be able to plot the data in the same way as in the question, we have to format the density result in the same way as objects from the kde class:

# to make the kde estimate compatible with the other density estimates
# from the ks package, the result can be converted to a named list.
# -> create ks::KDE object:
axes = matrix(c(result$x,result$y), ncol = 2)
colnames(axes) = c('longitude', 'latitude')

my.fhat_biweight = list('x' = axes,
                        'eval.points' = list(result$x, result$y),
                        'estimate' = result['z']$z,
                        'gridtype' = 'linear', 'gridded' = TRUE,
                        'binned' = TRUE, 'names' = c("longitude","latitude" ))

# add quantile to ks::KDE object
my.fhat_biweight$cont = quantiles

# change class (make sure ks package is loaded for this)
class(my.fhat_biweight) <- "kde"

Finally plot the biweight kernel density over the data

plot(my.data$longitude, my.data$latitude)
plot(my.fhat_biweight, lwd = 3, display = "filled.contour", cont = cont=c(96.5), add = TRUE)

this outputs:

final plot

like image 98
yuki Avatar answered Oct 16 '22 23:10

yuki