Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Comparing simple features {sf} and Spatial objects {sp}: speed and memory

I'm using some spatial data in R, and wondering whether use the packages/functions that rely on the old Spatial format (package sp) or the new package sf. I made this test, based on code found here.

The idea is to "identify all points falling within a maximum distance of xx meters with respect to each single point in a spatial points dataset".

library(tictoc)

# define a buffer distance and a toy data
maxdist <- 500
df<-data.frame(x = runif(10000, 0, 100000),y = runif(10000, 0, 100000),id = 1:10000)

# doing the analysis using sf
library(sf)
tic("sf")
pts     <- df %>% st_as_sf(coords = c("x", "y"))
pts_buf <- st_buffer(pts, maxdist,nQuadSegs = 5)
int     <- st_intersects(pts_buf, pts)
toc()

# doing the analysis using sp
library(sp)
library(rgeos)
tic("sp")
pts2      <- SpatialPointsDataFrame(df[,1:2],as.data.frame(df[,3]))
pts_buf2  <- gBuffer(pts2, byid=TRUE,width=maxdist)
int2      <- over(pts_buf2, pts2,returnList = TRUE)
toc()

# size of the objects
object.size(pts)<object.size(pts2)
object.size(pts_buf)<object.size(pts_buf2)

Using sf seems to be much better as faster (around 0.53 vs 2.1 seconds in my machine) and requiring less memory. There is one exception though. Why object pts is much larger than pts2? Is sf less efficient in storing a vector of points?

like image 667
Hugo Avatar asked Mar 13 '18 14:03

Hugo


People also ask

Is SF faster than SP?

Using sf seems to be much better as faster (around 0.53 vs 2.1 seconds in my machine) and requiring less memory.

What is the difference between SF and SP in R?

The SF (or Simple Features) library is a big change in how R handles spatial data. Back in the 'old days', we used a package called SP to manage spatial data in R. It was initially developed in 2005, and was a very well-developed package that supported practically all GIS analysis.

What is simple feature object?

At its most basic, an sf object is a collection of simple features that includes attributes and geometries in the form of a data frame. In other words, it is a data frame (or tibble) with rows of features, columns of attributes, and a special geometry column that contains the spatial aspects of the features.


1 Answers

One reason I can think of:

The pts (sf) object keeps an attribute specifying the 'type' of geometry object for every row. This is because every row has the potential to be a different geometry in an sf object.

Whereas the pts2 (sp) is in an object of class SpatialPointsDataFrame. It can only hold points, so there's no need to keep the extra attribute for each 'row'

Take the first two rows of the sf object as an example, these are the attributes associated with those geometries

lapply(1:2, function(x) attributes(st_geometry(pts[x, ])))
# [[1]]
# [[1]]$names
# [1] "1"
# 
# [[1]]$class
# [1] "sfc_POINT" "sfc"      
# 
# [[1]]$precision
# [1] 0
# 
# [[1]]$bbox
# xmin     ymin     xmax     ymax 
# 81647.16 72283.90 81647.16 72283.90 
# 
# [[1]]$crs
# Coordinate Reference System: NA
# 
# [[1]]$n_empty
# [1] 0
# 
# 
# [[2]]
# [[2]]$names
# [1] "2"
# 
# [[2]]$class
# [1] "sfc_POINT" "sfc"      
# 
# [[2]]$precision
# [1] 0
# 
# [[2]]$bbox
# xmin      ymin      xmax      ymax 
# 5591.116 38967.060  5591.116 38967.060 
# 
# [[2]]$crs
# Coordinate Reference System: NA
# 
# [[2]]$n_empty
# [1] 0

Every row of this sf data.frame will have similar attributes.

If you strip the attributes from the geometries (to just leave a data.frame)

pts_coords <- st_coordinates(pts)
pts_striped <- pts
st_geometry(pts_striped) <- NULL
pts_striped <- cbind(pts_striped, pts_coords)

And then compare the object sizes

object.size(pts_striped)
# 200896
object.size(pts2)
# 203624

The objects are much closer aligned in size

like image 142
SymbolixAU Avatar answered Sep 28 '22 19:09

SymbolixAU