In R, there are deviations calculating the mean in function 'zonal.stats' of package 'spatialEco' compared to 'extract' in package 'raster'. For both I used a polygon as the zone field and a raster for the values.
Here is an example:
library(raster)
library(spatialEco)
library(sp)
#Create raster
ras <- raster(nrows=100, ncols=80, xmn=0, xmx=1000, ymn=0, ymx=800)
val <- runif(ncell(ras))
values(ras) <- val
#Create polygon within raster extent
xym <- cbind(runif(3,0,1000), runif(3,0,800))
p <- Polygons(list(Polygon(xym)),1)
sp <- SpatialPolygons(list(p))
spdf <- SpatialPolygonsDataFrame(sp, data=data.frame(1))
#z1 zonal statistics using "spatialECO"
z1 <- zonal.stats(spdf, ras, stats="mean")
#z2 zonal statistics using "raster"
z2 <- extract(ras, spdf, fun=mean)
What causes the deviation of z2 and z1?
spatialEco::zonal.stats
uses exactextractr
(I have not checked the code, but it told me to install it to be able to use zonal.stats
) which should be more exact if you are considering polygons (the raster package turns them into rasters first, see zonal
below). However, the below example (it is only one case) suggest that spatialEco is less precise.
Example (avoid random numbers, but if you do use them, use set.seed
). I start with very large grid cells.
library(raster)
library(spatialEco)
ras <- raster(nrows=4, ncols=4, xmn=0, xmx=1000, ymn=0, ymx=800)
values(ras) <- 1:ncell(ras)
set.seed(1)
xy <- cbind(runif(3,0,1000), runif(3,0,800))
xy <- rbind(xy, xy[1,])
sp <- spPolygons(xy, attr=data.frame(x=1))
### zonal statistics using "spatialECO"
zonal.stats(sp, ras, stats="mean")
# mean.layer
#1 7
### zonal statistics using "raster"
extract(ras, sp, fun=mean)
# [,1]
#[1,] 6
### same as
# x <- rasterize(sp, ras)
# zonal(ras, x, "mean")
With raster you can also get a more precise estimate like this
e <- extract(ras, sp, weights=T)[[1]]
weighted.mean(e[,1], e[,2])
#[1] 5.269565
To see how many cells are used
zonal.stats(sp, ras, stats="counter")
# counter.layer
#1 6
extract(ras, sp, fun=function(x,...)length(x))
# [,1]
#[1,] 3
One way to look at this is to create higher resolution raster data.
10x higher resolution, same values
ras <- disaggregate(ras, 10)
zonal.stats(sp, ras, stats="mean")
# mean.layer
#1 5.5
extract(ras, sp, fun=mean)
# [,1]
#[1,] 5.245614
zonal.stats(sp, ras, stats="counter")
# counter.layer
#1 218
extract(ras, sp, fun=function(x,...)length(x))
# [,1]
#[1,] 171
100x higher resolution, same values
ras <- disaggregate(ras, 10)
zonal.stats(sp, ras, stats="mean")
#mean.layer
#1 5.299915
extract(ras, sp, small=TRUE, fun=mean)
# [,1]
#[1,] 5.271039
zonal.stats(sp, ras, stats="counter")
# counter.layer
#1 17695
extract(ras, sp, fun=function(x,...)length(x))
# [,1]
#[1,] 17289
At the highest resolution the mean values are similar (and the relative difference in the number of cells is small); but raster was closer to the correct value (whatever that is, exactly) at a lower resolution (and also with the weighted mean). That is unexpected.
For better speed, there is now also the terra
package
library(terra)
r <- rast(ras)
v <- vect(sp)
extract(r, v, "mean")
# ID layer
#[1,] 1 5.271039
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With