I'm using some spatial data in R, and wondering whether use the packages/functions that rely on the old Spatial format (package sp) or the new package sf. I made this test, based on code found here.
The idea is to "identify all points falling within a maximum distance of xx meters with respect to each single point in a spatial points dataset".
library(tictoc)
# define a buffer distance and a toy data
maxdist <- 500
df<-data.frame(x = runif(10000, 0, 100000),y = runif(10000, 0, 100000),id = 1:10000)
# doing the analysis using sf
library(sf)
tic("sf")
pts <- df %>% st_as_sf(coords = c("x", "y"))
pts_buf <- st_buffer(pts, maxdist,nQuadSegs = 5)
int <- st_intersects(pts_buf, pts)
toc()
# doing the analysis using sp
library(sp)
library(rgeos)
tic("sp")
pts2 <- SpatialPointsDataFrame(df[,1:2],as.data.frame(df[,3]))
pts_buf2 <- gBuffer(pts2, byid=TRUE,width=maxdist)
int2 <- over(pts_buf2, pts2,returnList = TRUE)
toc()
# size of the objects
object.size(pts)<object.size(pts2)
object.size(pts_buf)<object.size(pts_buf2)
Using sf seems to be much better as faster (around 0.53 vs 2.1 seconds in my machine) and requiring less memory. There is one exception though. Why object pts is much larger than pts2? Is sf less efficient in storing a vector of points?
Using sf seems to be much better as faster (around 0.53 vs 2.1 seconds in my machine) and requiring less memory.
The SF (or Simple Features) library is a big change in how R handles spatial data. Back in the 'old days', we used a package called SP to manage spatial data in R. It was initially developed in 2005, and was a very well-developed package that supported practically all GIS analysis.
At its most basic, an sf object is a collection of simple features that includes attributes and geometries in the form of a data frame. In other words, it is a data frame (or tibble) with rows of features, columns of attributes, and a special geometry column that contains the spatial aspects of the features.
One reason I can think of:
The pts
(sf
) object keeps an attribute specifying the 'type' of geometry object for every row. This is because every row has the potential to be a different geometry in an sf
object.
Whereas the pts2
(sp
) is in an object of class SpatialPointsDataFrame
. It can only hold points, so there's no need to keep the extra attribute for each 'row'
Take the first two rows of the sf
object as an example, these are the attributes associated with those geometries
lapply(1:2, function(x) attributes(st_geometry(pts[x, ])))
# [[1]]
# [[1]]$names
# [1] "1"
#
# [[1]]$class
# [1] "sfc_POINT" "sfc"
#
# [[1]]$precision
# [1] 0
#
# [[1]]$bbox
# xmin ymin xmax ymax
# 81647.16 72283.90 81647.16 72283.90
#
# [[1]]$crs
# Coordinate Reference System: NA
#
# [[1]]$n_empty
# [1] 0
#
#
# [[2]]
# [[2]]$names
# [1] "2"
#
# [[2]]$class
# [1] "sfc_POINT" "sfc"
#
# [[2]]$precision
# [1] 0
#
# [[2]]$bbox
# xmin ymin xmax ymax
# 5591.116 38967.060 5591.116 38967.060
#
# [[2]]$crs
# Coordinate Reference System: NA
#
# [[2]]$n_empty
# [1] 0
Every row of this sf
data.frame will have similar attributes.
If you strip the attributes from the geometries (to just leave a data.frame)
pts_coords <- st_coordinates(pts)
pts_striped <- pts
st_geometry(pts_striped) <- NULL
pts_striped <- cbind(pts_striped, pts_coords)
And then compare the object sizes
object.size(pts_striped)
# 200896
object.size(pts2)
# 203624
The objects are much closer aligned in size
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With