Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

spatial join on two simple features {sf} with over 1 mil. entries as fast as possible

I hope this is not too trivial but I really can't find an answer and I'm too new to the topic to come up with alternatives myself. So here is the Problem:

I have two shapefiles x and y that represent different processing levels of a Sentinel2 satellite image. x contains about 1.300.000 polygons/Segments completely covering the image extend without any further vital information.
y contains about 500 polygons representing the cloud-free area of the image (also covering most of the image except for a few "cloud-holes") as well as information about the used image in 4 columns (Sensor, Time...)

I'm trying to add the image information to x in places x is covered by y. pretty simple? I just can't find a way to make it happen without taking days.

I read x in as a simple feature {sf}, as reading it with shapefile / readOGR takes ages. I tried different things with y

when I try merge(x,y) I can only take one sf as merge doesn't support two sf's. merging x (as sf) and y (as shp) gives me the error "cannot allocate vector of size 13.0 Gb"

so I tried sf::st_join(x,y), which supports both Variables to be sf but still didn't finish for 28 hours now

sf::st_intersect(x,y) took about 9 minutes for a 10.000 segment subset, so that might not be a lot faster for the whole piece.

could subsetting x to a few smaller pieces solve the whole thing or is there another simple solution? could I do something with my workspace to make the merge work or is there simply no shortcut to joining that amount of polygons?

Thanks a lot in advance and I hope my description isn't too fuzzy!

my tiny work station:
win 7 64 bit
8 GB RAM
intel i7-4790 @ 3,6 GHz

like image 337
Matthias_Stack Avatar asked Apr 05 '17 11:04

Matthias_Stack


People also ask

What does spatial join mean?

A spatial join involves matching rows from the Join Features to the Target Features based on their relative spatial locations. By default, all attributes of the join features are appended to attributes of the target features and copied over to the output feature class.

How do you perform a spatial join in Arcgis?

You can perform a join with either the Join Data dialog box, accessed by right-clicking a layer in ArcMap, or a geoprocessing tool. You should use the Spatial Join tool rather than the dialog box if you are performing spatial joins with large or complex datasets.

What are the main types of geographical simple feature objects available in the R sf package?

Geometries are the basic building blocks of simple features. Simple features in R can take on one of the 17 geometry types supported by the sf package. In this chapter we will focus on the seven most commonly used types: POINT , LINESTRING , POLYGON , MULTIPOINT , MULTILINESTRING , MULTIPOLYGON and GEOMETRYCOLLECTION .


1 Answers

I often face this kind of problems and as @manotheshark2 afirms, I prefer to work in a loop subseting my vector layer. Here is my advice:

Load your data

library(raster)
library(rgdal)
x <- readOGR('C:/', 'sentinelCovers')
y <- readOGR('C:/', 'cloudHoles')

Assign an y ID for identify which x polygons intersects y polygons and create the column in x table

x$xyID <- NA # Answer col
y$yID <- 1:nrow(y@data) # ID col

Run a loop subseting x

for (posX in 1:nrow(x@data)){
  pol.x <- x[posX, ]
  intX <- raster::intersect(pol.x, y)
  # x$xyID[posX] <- intX@data$yID ## Run this if there's unique y polygons
  # x$xyID[posX] <- paste0(intX@data$yID, collapse = ',') ## Run this if there's multiple y polygons
}

You can check if is better to run the loop on x o y layer

x$xyID <- NA # Answer col
x$xID <- 1:nrow(x@data) # ID Col

for (posY in 1:nrow(y@data)){
  pol.y <- y[posY, ]
  intY <- tryCatch(raster::intersect(pol.y, x), finally = 'NULL')
  if (is.null(intY)) next
  x$xyID[x@data$xID %in% intY@data$xID] <- pol.y$yID
}
like image 111
gonzalez.ivan90 Avatar answered Oct 22 '22 16:10

gonzalez.ivan90